Hello to all...
I should design an NSX-T integration on the existing infrastructure.
The existing infrastructure consists of 2 Sites (Site-A and Site-B) in Active-Active mode, without shared storage, but only L2-Stretched network and L3 HA network between the sites.
Site-A is primary site; Site-B is secondary site.
Each site has its own Cluster and both clusters are managed by the same vCenter.
Similar to this one:
NSX-T Active Active deploy
I've NSX-T Professional license...
Workload VMs are replicated between sites with Veeam and VCSA is in HA on the secondary site.
Is it possible to integrate NSX-T in this architecture, so that in case of Site-A Failure, everything works on Site-B and vice versa?
I've read some of documentation on the Internet, but have not found a solution...
Can you help me in this hard work for me ?
Thanks for any suggestions.
First 5 dot points look good, what is the network you are stretching?
Hi,
Have you looked at NSX-T 3.1 Multi-Location Design Guide (Federation ... - VMware Technology Network VMTN ?
This is a good guide for multi-site.
Hi,
yes, I read it, but my issue is that I've Professional license and it not include Multisite and Federation feature.
for this reason I am looking for a solution, which may be good.
for this reason I am looking for a solution, which may be good.
For example, I was thinking about replicating the Edge VM with Veeam and restoring NSX-T, but does this solution works/is supported?
Or other solutions ...
I accept suggestions 😁
Thanks.
Hi,
Just an idea:
Have 4 edge vm's (2 in each DC for local redundancy)
Create t0 in active-active ecmp to your core.
Deploy a T1 A-S for DC1, with active on DC1 and standby on DC2 (use failure domains)
And for DC2 vica-versa.
Use a stretched L2 for the vtep network. (you could do it routed, but you have to add some routing somewhere to it)
In this case you will have all your segment on both dc. And still benefit having the T1 in the correct datacenter.
Hi @p0wertje ,
thanks for your reply...
I am not very familiar with the functionality of the fault domain, so I try to explain what I understand.
I install and configure 3 node NSX-T Manager.
I create 2 Edge for each site and then create an Edge Cluster with all Edges.
I create a T0-Gw (Act/Act) with 8 Uplinks (two for each Edge), enable ECMP and configure Route Maps for correct routing in case of site failure.
I create one T1-Gw (Act/Stb) for each site (with Only DR or also SR ?)
For each Edge, I configure Failure Domain (Edge1 and Edge2 in Failure Domain-A ;Edge3 and Edge4 in Failure Domain-B), in this way T1-A Std will position on the Edge of Site-B and T1-B Std on those of Site-A, right?
Now, I've some questions...
Is it possible to create 2 NSX-T Manager nodes in Site A and 1 in Site B if the hosts are in two different clusters? (This way I can avoid restoring NSX-T in the event of a site failure)
Does T0-Gw use Failure Domain function like T1-Gw when I implement it in Act / Act mode? If, NO ... how will the T1 traffic be forwarded to the edge if the T0 is not present in the event of a site fault?
Sorry and Thanks again...🙏
You can split your NSX-T manager cluster as long as they meet the RTT and other requirements.
The T0 gateways do not support failure domains.
The T1DR (if you have no SR component), uses ECMP paths to the T0DR, each edge acts as a path to a prefix. https://communities.vmware.com/t5/VMware-NSX-Documents/NSX-T-and-ECMP/ta-p/2840738
Hi,
The 'downside' in this design is because of the t0-ecmp to the outside world, that you have 4 paths incoming. And that is over two datacenters.
I don't know if that is acceptable for you. You might be able to steer it with route-maps,I have not tested that, so i don't know the result.
If you really need incoming and outgoing to be on one datacenter, you could go with
4 edge nodes, but in two edge clusters. One t0 active-standby on the node on DC1, standby on the node on DC2. and visa-versa.
You can only run one t0 per edge node.
The downside of this is the upgrading of you edge nodes, because the traffic goes over the standby node and thus the other DC when you upgrade the active node.
You can steer traffic to the T0 or edges, but T1SR to T0DR uses 2 tuple load balancing with the paths it has available to active SRs.
Thanks @shank89 & @p0wertje ...
I don't want to complicate the design with the T1SR ... the professional license does not have the LB feature, so for now let's leave the T1SR out ... thanks and sorry if I wrote it, it was just to understand better.
Both the site are Active, so both are receiving N/S traffic.
Honestly what I haven't been able to understand is how to configure T0 when a site goes into fault. (Because the critical point of this design seems to be precisely the routing of traffic from T1DR to T0 in case of fault)
We said that:
We said that T0 doesn't support Failover Domain. So how can the T0 of Site-A be deployed on the Edge of Site-B, if I create 2 Edges Cluster?
For the T0 and Route Map I saw this link: https://www.lab2prod.com.au/2020/09/nsx-t-active-active-multisite-part2.html
Is there a specific configuration on the T0 side that I have to do in order not to have problems in case of DR?
Thanks a lot , again 🙂
To clarify, failure domains are to predictively place the SR component of the T1 gateways. The DR component is meant to be distributed and does not have an active and standby component. Here are a couple of links for that;
For the T0, you can have it in Active/Active or Active/Standby, that is up to you. If you have 4 edges and have them placed at either site or not, you steer the traffic using prepends and local preference as shown in the link you added in your previous response. If you would like faster failover and are using BGP, consider using BFD.
As with anything, test failover, ensure the behaviour is what you expect and predicted.
Thanks @shank89 for clarifying the T1SR Failure Domain.
I know that the T0DR and T1DR is deployed within each ESXi host belonging to the cluster where the Edge VM is present to which the T0, and consequently the T1, is connected, right?
If so, the T0DR of Site-A is not present in the hosts of Site-B.
So how will the traffic work in the event of a fault?
If this is not the case, when I create a T0DR and a T1DR, these are distributed in all hosts prepared with NSX-T, it means that in the event of a Fault I will not have problems as the T0DR and T1DR are already present on the hosts of the other site.
How exactly does it work?
This will come down to how you prep the environment. If you need segments etc available in the second site, you will need to have them all part of the same overlay transport zone. If this does not happen, the transport nodes in Site-2 will not see the networks you want them to have.
There may be very manual methods of DR to get around this or scripted if you want to (connect the T0 to an edge cluster on the failure site once Site A goes down), but my general recommendation is to simplify DR to avoid any human failures.. I mean if it is a true DR, there's enough going on anyway.
You should find what you need from slide 26 onwards. https://www.dropbox.com/s/tvwqhjhbwd7hy4j/Multisite_NSX-T_3.1-v1.0.pptx?dl=0.
Of course, I will create segments to connect to T1DR.
All segments of the two sites will be on the same TZ Overlay.
All hosts and Edges from the two sites will be on the same TZs.
Excuse me if I insist, but what I don't understand is if the T0DR and T1DR are distributed on all the hosts of the cluster where there is the Edge to which the T0 is connected and consequently the T1, or on all the hosts prepared with NSX-T, regardless of the cluster where the Edge is positioned to which the T0 and consequently T1 is connected.
Because based on how the T0DR and T1DR are distributed there will be different considerations for the DR ... right?
I saw the ppt on the DR and NSX-T Multisite, thanks.
T1DRs and T0DRs exist on all transport nodes that are prepared for NSX-T.
ah okk ... sorry I didn't understand / read correctly ...
Excuse me again ... I try to summarize everything to see if I understand correctly ....
I have two sites like in the drawing above:
I create:
I don't use Fault Domain because I don't have T1SR (if I had T1SR then I would also use FDs)
In the event of Site failure, everything works (or should 😅) , because:
I hope to understood correctly...
Thanks again @shank89 🙏 for you patience 😇
First 5 dot points look good, what is the network you are stretching?
Hi,
Sounds correct.
And if you do not need T1-SR, you do not have to deploy it. you just have a DR only then.
And keep the points @shank89 mentions in mind.
what is the network you are stretching? I can extend all L2 networks needed
Just a clarification ... all this work with Professional license (no Multisite-Federation feature), right ?
Dataplane still works if the management plane is down / in readonly.
The choice of A/S is up to you, you will have to work out what is best for your scenario.
Yes, if there is active workload on a segment in the remaining site, and only those hosts and edges exist, the traffic will egress that site.
I would say so, as this just comes down to cluster design within a single instance of NSX-T.
Perfect ...
Thank you very much for your time and patience