I created a VM heartbeat alarm on a cluster with ESX 3.5U4 Servers and everything worked as expected.
After an upgrade to ESXi 4.0 U1 the alarm is now always bouncing between green and yellow. This happens on all VMs in the cluster
(Linux and Windows) and with old or new VMware tools.
When I move the VM back to a 3.5 host, the bouncing stops.
The vCenter version is 4.0U1
Since the status is also logged in hostd.log, it looks like a problem between the host and the VM, but where ?
Here is an example of a hostd.log from an ESX server:
2010-01-20 13:47:04.320 27A36B90 verbose 'vm:/vmfs/volumes/.../test/test.vmx' Updating current heartbeatStatus: yellow
2010-01-20 13:47:44.322 27AF9B90 verbose 'vm:/vmfs/volumes/.../test/test.vmx' Updating current heartbeatStatus: green
2010-01-20 13:48:04.322 27B7BB90 verbose 'vm:/vmfs/volumes/.../test/test.vmx' Updating current heartbeatStatus: yellow
2010-01-20 13:48:44.325 27AB8B90 verbose 'vm:/vmfs/volumes/.../test/test.vmx' Updating current heartbeatStatus: green
2010-01-20 13:49:04.327 27AF9B90 verbose 'vm:/vmfs/volumes/.../test/test.vmx' Updating current heartbeatStatus: yellow
2010-01-20 13:49:44.327 27B7BB90 verbose 'vm:/vmfs/volumes/.../test/test.vmx' Updating current heartbeatStatus: green
2010-01-20 13:50:04.329 27B7BB90 verbose 'vm:/vmfs/volumes/.../test/test.vmx' Updating current heartbeatStatus: yellow
2010-01-20 13:50:44.331 27A36B90 verbose 'vm:/vmfs/volumes/.../test/test.vmx' Updating current heartbeatStatus: green
2010-01-20 13:51:04.334 10799B90 verbose 'vm:/vmfs/volumes/.../test/test.vmx' Updating current heartbeatStatus: yellow
2010-01-20 13:51:44.336 27BBCB90 verbose 'vm:/vmfs/volumes/.../test/test.vmx' Updating current heartbeatStatus: green
2010-01-20 13:52:04.338 27A77B90 verbose 'vm:/vmfs/volumes/.../test/test.vmx' Updating current heartbeatStatus: yellow
I have seen this issue between 4.0 U1 and both 4.0 and 3.5 (all ESX, not ESXi).
I opened a SR with VMware about the issue.
I have almost 400 guests (which send a heartbeat back to vCenter every minute). VMware suggested disabling the heartbeat alarm, given that we are not using guest monitoring in HA. Not sure why that was the recommended fix. They are still researching the issue. For now our alarm is disabled.
Jase McCarty
Co-Author: VMware ESX Essentials in the Virtual Data Center (ISBN:1420070274) Auerbach
Co-Author: VMware vSphere 4 Administration Instant Reference (ISBN:0470520728) Sybex
Please consider awarding points if this post was helpful or correct
I got a confirmation from VMware that upgrading from vCenter 2.5 to 4.0 is where the problem lies.
The issue will be resolved in a future update.
Jase McCarty
Co-Author: VMware ESX Essentials in the Virtual Data Center (ISBN:1420070274) Auerbach
Co-Author: VMware vSphere 4 Administration Instant Reference (ISBN:0470520728) Sybex
Please consider awarding points if this post was helpful or correct
Thanks for the information that I'm not alone with this problem.
But I did not made an upgrade to vCenter 4.0. I made a new installation and moved the hosts from a 2.5 vCenter to a 4.0 vCenter.
For now I disabled the alarm from green to yellow and waiting for a fix.
Can you give us a rough idea of the network setup - specifically what vSwitches and Port Groups have you got for Service Console\Kernel and VM's on the ESX 3.5 box(es) and what Management Port \ VM Port groups have you got on the ESXi box(es) |
from http://serverfault.com/questions/104687/vm-heartbeat-problem
Starwind Software Developer
Hi,
have the same probleme, while monitoring our vsphere Farm with nworks monitor from veeam.
So lets hope, the fixes it a an next update.
MCP, VCP
the network setup is the same for ESXi 3.5 and 4.0 hosts:
We have recently upgraded our test cluster to vCenter vSphere 4U1 and ESXi4.0U1 and see the same thing. Random virtual machines flash up the "VM change state" alarm. Looking at the SNMP traps being sent out by the hosts we see a lot of trap numbers 3 and 4 which according to the MIBs are vmwVmHBLost and vmwVmHBDetected respectiviely. I used to see these traps being sent out before from ESXi3.5 but of course then there was no alarm defined in vCenter for VM Change State, so I'm not sure if this has always been going on and now we only notice because there are a lot more alarms in vSphere vCenter.
But it does seem there are a lot more of these SNMP traps being sent out from ESXi4.0U1 than from ESXi3.5