Hi,
I have a 2-node ESXi cluster (4.1U1) and after enabling HA I am having constant intermittent failures reported by HA. Typically its one of 2 messages:-
"A possible host failure has been detected by HA on host xxxx" or "HA recovered from a total cluster failure in cluster xxx in Data Center yyy"
Sometimes then VC events view shows that it has recovered and is "healthy" and other times it just stays in what looks like a failure situation with a red ! next to the node. Typically this is happening on node-2.
I have tried entering/exiting Maintenance mode which sometimes resets everything to healthy but sometimes not. I have also disabled and re-enable HA which again works for a day or so.
Today I removed disabled HA, removed both nodes from the cluster, deleted the cluster object and re-added the nodes and re-enabled HA.All reported successful but only lasted a few hours before the errors occurred again.
I dont think Name resolution is an issue as aI can ping each node from the other using short and FDQN.
Any ideas would be greatly appreicated as its driving me nuts
.
Please take a look at http://kb.vmware.com/kb/1026825 to rule out duplicate IP addresses.
André
Please take a look at http://kb.vmware.com/kb/1026825 to rule out duplicate IP addresses.
André
Andre,
Thank you very much. That was exactly what the problem was. I couldn't find any reference to "Duplicate IP" in either log, but I did see ping failures.
Thanks again,
Gerry