I tried testing HA on a 2 hosts cluster. Everything went as expected beside one minor thing.
Please, read carefully.
I tried a continuing ping on 2 VMs on Host A, on 1 VM on Host B and on Host A itself.
When I shutdown Host B, the Vm it owns was restarted on Host A, losing pings on this VM until it came back online.
I did lost any pings on the other devices.
But when I restarted host B, I did not lost any pings from the SC of Host A, but as ESX was booting on Host B, I lost 3-4 pings from the VMs on Host A (wich was untouched).
How in the world is that possible?
Oh, I almost forgot.
On Host A, I get alarms:
"network uplink redundancy lost"
"Network connectivity lost"
No errors on HA logs.
Also, I lost 12 pings on VMs on Host A, not 4-5.
how are the nic's configured on each host to provide resilience and redundancy to the service console.
There is 2 SC with 2 Nics each on 2 seperate vSwitch. The vSwitch is on Failover Team.
The VMs use a diffenrent vSwitch with, obviously, diffenrent nics.
As I mentionned, I do not lose connectivity to the SC of the Host, only to the VMs
I am wondering if the issue is to do with host B booting and HA is a secondary problem.
I would recheck the config of host B and the network ports it is connected to to make sure they are all configured correctly. This includes, vLAN config, tagged ports...etc
I could see if HA is the problem by disabling it, and do the test again.
I would agree - From what you have described it would appear the VM network on host A is being impacted by the reboot of host B - it might be worth bring down host A and see if the same issues occur on host B.
Just did the test by disabling HA first.
No pings were lost.
It seems to be related to HA when the second host reboots... But how, this I don'T get.
and what happens if you reboot host A
Sorry, I did get the same Alarms on the host that is not rebooted.
But I did'nt see if the pings were lost.
I would review your physical network configuration for both these hosts and make sure the physical network configuration is correct on the network switche(s) - it looks to me like you have a network issue.
But why would it affect only VMs and only when HA is enable?
The only thing i can think is there is a misconfiguration on the network switch either ports are not tagged correctly and they think they are on the correct vLAN - we can guess at the reasons but until you start checking you are never going to know.
It was a VLAN 1 problem. VMWare told me that ESx does'nt like VLAN 1