Hi all,
trying to upgrade VCF to 4.2 from 4.1, the SDDC manager upgrade completed but now fails at the next step - 'configuration bundle drift update".
It looks like it's trying to update the cluster HA config and fails with "VSPHERE_HA_UPDATE_ISOLATION_RESPONSE_FAILED Failed to set vSphere HA Isolation Response for VM(s)" error (full log is attached).
Any help would be appreciated.
It turned out that one of the Edge VMs somehow got removed from the "VM Overrides" configuration section in vCenter (on the cluster level). Once all Edge nodes were added to the "VM Overrides" with "vSphere Restart Priority" set to "high", the upgrade was able to get past this error.
Have you had a look for more information in the LCM and operations manager log?
Unfortunately, I don't see anything relevant in those logs (attached).
I've noticed that after the upgrade the SDDC manager started to display "Retrieving tasks list failed. Something went wrong. Please retry or contact the service provider and provide the reference token." message.
Another error in the Security->Password Manager tab - "Failed to get tasks data. Something went wrong. Please retry or contact the service provider and provide the reference token."
Filtering for ERROR|WARN in the operationsmanager.log shows some errors related to password update?
2021-02-12T22:10:47.554+0000 ERROR [vcf_om,bd3bf5535c359fbc,8be1] [c.v.v.p.service.RestModelTranslator,http-nio-127.0.0.1-7300-exec-3] Diagnostic Message JSON parsing failed, Error : No enum constant com.vmware.vcf.passwordmanager.exception.PasswordManagerErrorCode.PASSWORD_UPDATE_CSS_PASSWORD_TEST_FAILED
I should also add that the vCenter shows a cluster reconfiguration error, so it's probably not related to the password manager issues.
I would try rebooting SDDC Manager or restarting domainmanager/operationsmanager and commonsvc.service services. See if it gets rid of those UI errors for a start.
It'll likely be part of the upgrade process that is failing. What did you get in the in the logs when the task fails. There is usually a java exception around the time of failure and some more details. It could also be in the domain manager log.
I have just upgraded SDDC Manager to 4.2, but I am getting "No available updates" when I select vCloud Foundation 4.2 under available updates for the Management Domain. I was expecting to install the configuration drift bundle since I am upgrading from vCF 4.1.0.1. The bundle has been uploaded/validated. There is also a (1) and "READY" indicating that there is an update waiting, but it won't show.
If you want to make complete sure there is nothing and its not just an SDDC Manager thing, you can try and find all offline bundles applicable to your environment.
I did do an offline bundle download, 98G worth. I can see the downloaded/validated 4.2.0.0 drift bundle. Wondering how to proceed. Might just re-install to be honest.
Coincidentally I have a different environment that has Internet access thru proxy, and I have no such problems there, i.e. the configuration drift bundle appears just after upgrading SDDC Manager to 4.2.0.0.
Yep, hence my suggestion to attempt an offline package sync to see if it is available that way.
The environment failing is offline and I downloaded all the packages from a fully updated 4.1.0.1 installation to no avail. The first update bundle gets applied, whereupon "no updates available". Can't see anything funny in the logs either. Ah well, I have gotten used to re-deploying, just takes a few hours now that I made scripts for it. 😛
This issue may be related to installing 4.1 with a single VDS. There is a known issue in the release notes for vcf on vxrail that (attempts badly) to explain a work around...
I would prefer not to have to redeploy the entire stack for this - is there someone that can help? I have deployed this twice and got the same exact result.
Have you opened a case with support ?
Sure I have... but please can someone read those release notes? There is a detailed non working fix there.... or at least its not working for me!
This thread is the first hit on Google and its so far only saying redeploy!
I've read it, is the process not working for you, are you running into an issue or is the port group not being updated?
Well need more information.
The process below detailed in the release notes doesnt work. Step 1 does not create a file "vds.json". I tried creating the file in step 2...and running the curl command in Step 5... no joy.
Add the AVN specific VLAN port group information to the JSON file saved in step 2 (vds-updated.json).
Run the following command to populate the inventory with AVN specific VLAN port groups:
What happens when you run the command in step 5 after creating the json? Also need to be careful of a improperly formatted json.
If for whatever reason you are not able to get a file from the output, you could also try a tool like postman that displays the output and it can then be saved as a JSON, keeping formatting intact.
Thanks for the offer of Help. The PUT command listed in the release notes definitely is mistaken. I have it reported to VMware support at this stage so hopefully an update will be issued soon. The problem only occurs with single VDS deployments - i have it tested on 3 Clusters at this stage and the fix is to add the missing info directly into SDDC manager database.
I wont publish that work around here as it is not for the faint of heart! Open a ticket, refer to the VDS issue in the release notes and they will fix it.
If anyone runs into this issue, here are the updated commands that I have successfully tested and validated at multiple VCF sites. This will affect any site upgrading from earlier version to 4.2 where the AVN networks were deployed on a single VDS on VxRail. With this fix we're looking to add the uplink port groups (01 and 02) that were created automatically by SDDC manager to support the AVN Edge uplinks;
curl -X PUT -H "Content-Type:application/json" --data @vds-update.json 127.0.0.1/inventory/vds/[MGMT VDS ID]