VMware Cloud Community
spham68
Contributor
Contributor

What causes HA agent to disable on a particular ESX host in a cluster?

All,

I am seeing strange behavoir with the HA agent on the ESX host. Sometimes when I am working in the virtualcenter, all the sudden one of the ESX host grayed out and showing not responded and I checked the summary page, it said configuration issue - HA agent disabled on hosta in cluster1.

Does any run into this behavior?

I am running ESX 3.5 with update 1 and post update 1 patches.

Thanks,

Steve

0 Kudos
11 Replies
weinstein5
Immortal
Immortal

does it return on its own? Check the network communicaiton between the esx host and VC -

If you find this or any other answer useful please consider awarding points by marking the answer correct or helpful
0 Kudos
vmmeup
Expert
Expert

Its most likely that the host is losing connectivity with VC and that is why it's reporting that is's disabling HA. I usually see this behavior when there is an issue with the vpxa agent on the ESX Host. What you can do is remove the host from VC, uninstall the vpxa agent from the host and then add the host back to virtual center.

Run rpm -qa | grep vpxa this will return the version of vpxa that is installed. After you get the version do rpm -e vpxa-version.rpm to remove it. The just add the host back to VC. When added VC will re-install the vpxa agent.

Sid Smith ----- VCP, VTSP, CCNA, CCA(Xen Server), MCTS Hyper-V & SCVMM08 [http://www.dailyhypervisor.com] - Don't forget to award points for correct and helpful answers. 😉
0 Kudos
spham68
Contributor
Contributor

Yes, it did.

Steve

0 Kudos
mittim12
Immortal
Immortal

I saw similar behavior in my lab after I upgraded several host with update 1 and was attempting to manage them with VC 2.5. The host would randomly disconnect from VC and then come back, of course this stopped when I upgraded VC to 2.5 update 1.

When upgrading the production systems I performed the VC update before the ESX host updates and did not see any of this behavior.

If you found this or any other post helpful please consider the use of the Helpfull/Correct buttons to award points

0 Kudos
mike_laspina
Champion
Champion

Hi,

How many hosts are in the the cluster?

Check the aam logs for errors, start with

cat /var/log/vmware/aam/vmware_yourhostname.log

http://blog.laspina.ca/ vExpert 2009
0 Kudos
mittim12
Immortal
Immortal

Did you ever resolve this issue?

If you found this or any other post helpful please consider the use of the Helpfull/Correct buttons to award points

0 Kudos
spham68
Contributor
Contributor

No. It is still occurring but as not frequently.

I can ping the all hosts and I was able to resolve hostname by using the ft_gethostbyname.

I installed the virtualcenter 2.5 with update 1, I did not upgrade the vc2.5 to vc2.5 with update 1.

Although, I am using the same nic teaming for the service console and vmotion network but dedicate vmnic0 as active and vmnic1 as standby for service console and vs for the vmotion network. As for the vlan, I setup one routable vlan for the service console and use nonroutable network for the vmotion.

Steve

0 Kudos
spham68
Contributor
Contributor

Four hosts in a cluster.

0 Kudos
mike_laspina
Champion
Champion

Many HA/DRS connectivity issues I have seen are on the physical switch config side using trunks and multiple switches.

Is this senario in play?

Did you have anything in the aam logs?

http://blog.laspina.ca/ vExpert 2009
0 Kudos
spham68
Contributor
Contributor

I have not enable a trunk port on those ports - each port does go to separate switch. I checked the hostname_vmware.log under aam directory and the only thing I saw was that for some reason one host decided to drop out and started loosing network connection and then the voting process started. And then everyting started to communicate with each other via the heartbeat. I don't know why all the sudden one host start to drop itselft which causes the HA agent to kick in.

Is there any issue with the HA with ESX 3.5 update 1 including post update 1 patches and VC 2.5 update 1?

Thx,

Steve

0 Kudos
mike_laspina
Champion
Champion

I am not aware of any issues with HA on 3.5.0u1. The most likely area to check if for errors on the physical switch port logs. Check for CRC's etc. this may just be a port/cable issue.

The other possibility is that host failed to properly upgrade the agent. Have you run the reconfigure for HA option from VC on the trouble some host?

http://blog.laspina.ca/ vExpert 2009
0 Kudos