VMware Networking Community

Edge (virtual and baremetal) Upgrade Fail from 3.2.3 to 4.1.1

Before I open a support case - has anyone else have failure with the 4.1.1 nsx upgrade (from 3.2.3)?

the first two edges encountered (parallel upgrade), a virtual (on ESX 7.0.3) and baremetal both failed at the same spot - 70%.

and when logging in to the appliances it would appear that both platforms have lost the fpeth nics/dataplane- though likely its a deeper issue.

ifconfig doesnt list the dataplane nics on either device (on the baremetal they are Intel x710)

tried the usual, reboot, resume etc. it looks like the OS upgrade went fine, mgmt nics still work fine


The log reports this (and pretty much same error for both the virtual/baremetal).

Pnic status of the edge transport node 4868e5cc-e684-4e66-9ad9-790f981e80f4 is DOWN.,Overall status of the edge transport node 4868e5cc-e684-4e66-9ad9-790f981e80f4 is DOWN.,Edge node 4868e5cc-e684-4e66-9ad9-790f981e80f4 , has errors Errors = [{"moduleName":"upgrade-coordinator","errorCode":30201,"errorMessage":"Pnic status of the edge transport node 4868e5cc-e684-4e66-9ad9-790f981e80f4 is DOWN."}, {"moduleName":"upgrade-coordinator","errorCode":30212,"errorMessage":"Overall status of the edge transport node 4868e5cc-e684-4e66-9ad9-790f981e80f4 is DOWN."}, ] after state sync wait.


I initially missed uploading the latest Update Coordinator pub file, but doubt that would cause the above. (went back after fail and uploaded it).

any insight would be appreciated.



0 Kudos
2 Replies
Hot Shot
Hot Shot

@wcoz Ever got any fix? I am facing the exact same thing at the moment and there doesn't seem to be anything known or visible in the logging unfortunately.

Visit my blog at https://vcloudvision.com!
0 Kudos

Yes this issue is resolved, I worked through the issue with the VMWare Engineering team on this, as a result the Fix is included in version onwards. the fix release note for it is: 

  • Fixed Issue 3277849: The Edge datapath process crashes on Sandy Bridge, Ivy Bridge, and Westmere CPUs. - Edge dataplane non-functional.



With the change/update of the underlying ubuntu operating system in the nsx edge image a flag was missed that resulted in legacy CPUs not being supported in 4.1.1, therefore the dataplane did not work. the issue is resolved in 4.1.2 with the VMware team changing a flag in the build that (essentially) re-enabled older CPUs. 


For those unaware there was a major update to the underlying version of ubuntu. this was a good thing. the older version of ubuntu didnt support newer versions of raid cards/nic cards. the newer version does. ie Dell R650 etc now run NSX Bare Metal Edge fine (still have to use Intel Nic cards however for the dataplane).


you should be ok if you just install (or any newer). I'm now running

if you have access to support you can quote case #23458958608 and get the full details. If you get stuck i can post some more details here, but would be better to get custom help from support.


I had instructions to rollback failed nsx edge 4.1.1 installs, that was to be run on each nsx edge node that an upgrade had been started on, this basically just rolled them back to the older version.   i did this on a few, and also tried just installing a new edge (while the install was going, worked fine). and just using the REPLACE CLUSTER MEMBER feature in the cluster (note, dont add the edge to an existing cluster, use the replace cluster member feature from the action menu).

I noted you could also continue the upgrade, without, upgrading the edges (which i believe required informing the upgrade coordinator to add the older (aka current) version of the edges as a 'whitelisted'/allowed version (which meant the upgrade coordinator finished without worrying about the edges being an older version. i can provide those details if you like. but if you have support, use it first and they can step you through better.)

Hope this helps for now. 



0 Kudos