Hello everybody!
I have 3 servers of same configuration (compatible in HCL). Was installed vsphere and esxi 6.0 latest version (U3). Deployed vSAN. Everything was great. It worked for a long time - no errors, no problem.
After updating to 6.5 (via ISO 201704001-5310538), all hosts have problems:
1. They do not enter in the Maintenance mode - error "Operation timed out".
2. Also, when rebooting, hosts began to boot very long (approximately 20 minutes instead of 5).
3. In vSAN Health I see the error "vSAN CLOMD liveness - failed", the status for hosts is "Abnormal" - "can not connect to clomd process and possible it's down".
When using the command "/etc/init.d/clomd start" CLOMD is started, but just a minute down again.
*Note (maybe important?) after updating the hypervisor and vserver, the drivers and firmware for the disk subsystem were updated via the web client (vsphere itself suggested updating).
Any help?
Going from 6.0 U3 to 6.5d (6.6) is not a supported path.
See Supported upgrade paths for vSAN 6.6 (2149840) | VMware KB
What can I do now? How to fix the situation?
I'm afraid that the best (and maybe only) option in this case is to immediately open a support case with VMware.
Assuming that you are not the first one who ran into this, they might be able to solve the issue.
André
Unfortunately, we have a key only for the evaluation period - through my personal account I can not send a request for technical support? Maybe there is another way?
Is this a lab/test environment?
You also need to be aware that there is a 6.6 version for vCenter. When doing upgrades vSphere with or without vSAN, the vCenter should be upgraded first, then the ESXi hosts.
If prod environment, GSS may need to be involved. If you are testing 6.6, you could re-image each host with 6.6, but you will have to configure those hosts from scratch which is a downside. If you have host profiles in place, then it would be easier.
At first - thank you very much for your patience and help!
1. Yes, it's a test lab, but we do not want to set everything up from zero...
2. Yes, we updated vCenter to the latest version 6.5d first
Questions:
1. I understand correctly that the problem is not in updating vCenter from version U3, but only in updating HOSTS from U3 version?
2. Is it true that in this case, if I completely reinstall esxi on hosts - will problems disappear?
3. Do I need to reinstall vCenter in this case or not?
4. Will the data on vSAN be corrupted? (I read that they should not be damaged when reinstalling the hypervisor).
5. Can I use "host profile" in vCenter when reinstalling esxi, so that I do not re-enter the settings for hosts manually?
Hi, As the hosts are affected, you could revert to the previous build.I have also done this and it worked fine, however this was reverted before i upgraded the On Disk format version to 5.0.
Reverting to a previous version of ESXi (1033604) | VMware KB
i am running vsan 6.6 and ran into the same issues.
Maintenance mode would timeout.
Finally found that CLOMD was abnormal on a host.
I ssh'd to the host and checked the status of the service.
I found it was not running.
I started it using /etc/init.d/clmod start
checked the status
and found it was now running.
I retried the maintenance mode and found it completed with no issues.
Hello,
OP said they tried restarting clomd so this is not the solution here - also a caveat to starting this service, on Witness nodes this is not running and should be left this way.
From the symptoms described it sounds like they may have been hitting this which is resolved in 6.5 U1 (and has some potential workarounds not specified here):
kb.vmware.com/kb/2149968
Bob