VMware Networking Community
keijd_04
Contributor
Contributor

NSX Cluster Manager Failure

Hi! I've been trying to look for resources regarding NSX Cluster Manager Failure unfortunately I cannot find one specific to my scenario and ff: up questions

Design: 3-Node Cluster located in a single Rack
Plan: Install / Configure a 3 Node NSX-T Manager Cluster, 1 NSX-T Manager node per ESXi Host, 6 10g uplinks only, 2 per esxi host contributed to DVS

Question:
1. If all NSX-T Manager Nodes fail

a. What will happen to my policies?
b. What will happen to my VMs?

2. Given that the manager and the controller is on the same appliance already what's the implication if all the manager node fails?

3. Is there a difference in the outcome if you are using VDS as compared if you are using an N-VDS?

Hope someone could help!

Thank you!

0 Kudos
8 Replies
cms7
VMware Employee
VMware Employee

As far as I know, to answer various points of your question as below: -

1. IF all the nodes of a NSXT manager cluster fail or go down, all your existent policies will still remain but you will not have any UI or API access to the NSX-T setup. You cannot perform any sort of operations(either creation of new objects or update to existent objects or deletion of objects, so no operations are possible).  Your VMs operating in the dataplane will still continue to operate in headless mode till a new NSXT manager node is deployed or the older ones are restored. So your VMs would not be disrupted,but you cant do any operations or modifications in your dataplane.

2. Since the NSX-T is a unified appliance(since NSXT release 2.4) with all components like Manager, controllers, policy etc all in the same appliance, so with the failure of all nodes of the NSXT manager, you would have lost your Management and Controller planes, and this is the reason why the user cannot perform any operations on the NSXT setup, and the dataplane will continue operations in a headless mode till atleast 1 node is restored or new one deployed.

3. To answer this part, from the NSX-T versions of 4.x the NVDS hostswitch type is deprecated and only VDS option would be available for the user. With the VDS option, all networking features of the regular VDS(like PVlan, netflow, portmirroring etc) would be available to use in integration with NSX-T, which would not have been available with NVDS. Besides VDS logic is already integrated in the vCenter, so this makes it possible to use all the vCenter API's for operations on the VDS which would not have been possible with the NVDS. So ease of administration and management is another advantage of moving from the NVDS to the VDS.  These are the reasons why VMware is eventually moving away from the NVDS and has deprecated it from NSX 4.x releases. 

I have tried to answer all the questions to the best of my knowledge, and hope this helps and you find it useful. If so you could mark your query as resolved and please give a kudos.

 

cms7
VMware Employee
VMware Employee

Just wanted to add another point here related to NVDS. The edge transport nodes in NSX would still be using NVDS. 

keijd_04
Contributor
Contributor

Hi @cms7 

Thank you for answering my queries, just an additional question though

Since only the NSX-T Manager Cluster went down
1. Will my VMs and Servers still be able to receive N/S and E/W Traffic?
2. Can I still use vCenter to manage my VMs? vmotion? HA? Power-on?

Thank you!

0 Kudos
cms7
VMware Employee
VMware Employee

To answer your below questions in the same order again: 

1. As long as no modifications or disruptive actions are done in the dataplane, as far as I know your VMs and servers should still be able to receive N/S and E/W traffic.

2. Yes you could use your vCenter to manage your VMs, but again here please consider carefully before doing operations like vMotion since if it causes any changes in the above NSXT data path, then the traffic would get disrupted and stopped. And since you would not have the manager and controller plane up still, the dataplane traffic would also be down. The same point applies in case of HA also if it were to cause any changes in the NSXT dataplane.

keijd_04
Contributor
Contributor

Hi @cms7 

These are useful information to know,

As a last question, how do we go about this case, I mean what do we do in case the whole NSX-T Manager Cluster Fails?
Is there anything else aside from restoring from back-up?

Thanks again!

0 Kudos
cms7
VMware Employee
VMware Employee

Yes you would need to know what exactly is the issue with the NSXT nodes for them to have gone down. If the nodes are down due to filesystem corruption issues, then we could run the required fsck commands on the corrupted partitions and bring them up.

If the issues with the nodes are something different, then it needs to be investigated if they can be brought up post the troubleshooting, and it depends on a case by case basis. The last resort is of course as you mentioned to restore from the backups. 

So if you feel all your questions have been answered satisfactorily then you can please resolve this issue in the community. Thanks.

sguadamu1
Enthusiast
Enthusiast

Hello Keijd

Question:
1. If all NSX-T Manager Nodes fail

a. What will happen to my policies?
b. What will happen to my VMs?

A/ NSX-T design contains 3 NSX-T manager, which share the same DB (Corfu DB) which will be sure that all the 3 managers have the same information. So if one node fails, you can remove it and redeploy a new NSX-T manager and you will be fine to continue. Important is, that your infrastructure will continue working but the Cluster will show as degraded. 

2. Given that the manager and the controller is on the same appliance already what's the implication if all the manager node fails?

A/ Same answer, NSX-T has tree nodes and theses tree nodes have manager and control planes running in the same VM,s. So is a node fails, your infrastructure will continue to run.

3. Is there a difference in the outcome if you are using VDS as compared if you are using an N-VDS?

A/ N-VDS is getting deprecated, so you will use NSX portgroups in your VDS.

Hope this helps.

SG

sguadamu1
Enthusiast
Enthusiast

Adding to this answer which is very good.

Yes, the traffic will continue working because the dataplane runs in the ESXi hosts. In the case your whole cluster crashes, you wont be able to add any new NSX-T component such as, Edges, T0-T1 or segments but the current configuration will continue to work with no issues.

Best Regards.

SG