VMware Cloud Community
Darryl201110141
Contributor
Contributor
Jump to solution

Hosts that won't stay connected to vCenter

ESXi 4.1.0 and vCenter 4.1.0

We have four hosts in a cluster, two have no issues while the other two will not stay connected to vCenter.  I have checked everything in KB article 1003409 (http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=100340...) and found no issues.

All four servers are IBM x3690 X5 and all are connected to the same four SAN LUNs, the only difference is that the management interfaces for the problem servers are on a separate subnet, but port 902 UDP is opened between them and the vCenter server.  I can manually reconnect them but they will disconnect again in a minute or so.

Does anybody have any ideas beyond what is in article 1003409 that I can check?

Thanks,

Darryl

0 Kudos
1 Solution

Accepted Solutions
AndySimmons
Hot Shot
Hot Shot
Jump to solution

Make sure you have port 902 TCP open, as well as UDP.

See this KB article: http://kb.vmware.com/selfservice/documentLinkInt.do?micrositeID=&popup=true&languageId=&externalID=1...

Regarding port 902 TCP:

vCenter Server system uses to send data to managed hosts. This port must not be blocked by firewalls between the server and the hosts or between hosts

If the problem persists, I would try removing the host(s) from the vCenter inventory, and re-adding it. That'll reinstall the vCenter agents on the hosts.

EDIT: That particular exerpt was taken from the vCenter 5.x entry, but the same is true for vCenter 4.x (it's further up the table). To be clear, you need 902 TCP and UDP open both from vCenter to the hosts, from the hosts to vCenter, and from host to host.

-Andy VCAP5-DCA, VCP-DV 4/5, MCSE, space camp graduate.

View solution in original post

19 Replies
a_p_
Leadership
Leadership
Jump to solution

I didn't read all the links in the KB article you mentioned, so this could be something you already checked. Anyway, please make sure all the hosts are able to resolve the vCenter Server's hostname (short name as well as FQDN) from the command line and vice versa. The issue you describe is often caused by DNS resolution issues.

André

0 Kudos
Darryl201110141
Contributor
Contributor
Jump to solution

Thanks for the quick reply!  All the servers are set to use our domain DNS servers and can resolve long and short names, I just did a quick test to be sure. Smiley Happy

Thanks,

DC

0 Kudos
jdptechnc
Expert
Expert
Jump to solution

Was this cluster working correctly previously, or is it a new setup?  Anything changed?

Please consider marking as "helpful", if you find this post useful. Thanks!... IT Guy since 12/2000... Virtual since 10/2006... VCAP-DCA #2222
0 Kudos
Darryl201110141
Contributor
Contributor
Jump to solution

The first two servers were (and still are) working fine, then we added the next two and they have the disconnecting issue.

0 Kudos
jdptechnc
Expert
Expert
Jump to solution

IP address conflict, perhaps?

What do you see in your vmkernel log on the hosts that are being disconnected?  Go to the console UI and view system logs, or open a shell or SSH session and open /var/log/vmkernel.log

Please consider marking as "helpful", if you find this post useful. Thanks!... IT Guy since 12/2000... Virtual since 10/2006... VCAP-DCA #2222
0 Kudos
Darryl201110141
Contributor
Contributor
Jump to solution

No address conflicts.

I had a look in /var/log/ and /var/log/vmware/ but there is no vmkernel.log file there.  Is there another location or name that it might have depending on the version of ESXi (4.1.0)?

0 Kudos
jdptechnc
Expert
Expert
Jump to solution

Sorry, I'm used to 5.x... in 4.x there is no vmkernel.log.  VMkernel messages are logged in /var/log/messages.  If vmkernel is logging potential IP networking issues (I'd assume it would be, if your vmk ports are going up/down), then you could do the following to filter vmkernel messages

cat /var/log/messages |grep vmkernel

Please consider marking as "helpful", if you find this post useful. Thanks!... IT Guy since 12/2000... Virtual since 10/2006... VCAP-DCA #2222
0 Kudos
iw123
Commander
Commander
Jump to solution

Can you connect a VI client directly to the hosts in question? Does that lose conectivity at all?

*Please, don't forget the awarding points for "helpful" and/or "correct" answers
0 Kudos
Darryl201110141
Contributor
Contributor
Jump to solution

Yes, connecting directly to the host with the vSphere client works fine, no disconnects.  It is just the connection in vCenter that keeps dropping, which would lead me to believe it is a heartbeat issue, but ICMP and port 902 UDP are both open between the vCenter server and the hosts.

I had a look in /var/log/messages but didn't see any connectivity errors from vmkernel.

0 Kudos
jdptechnc
Expert
Expert
Jump to solution

Any alarms on the cluster object or on the hosts themselves?  What do you see under Events on the hosts?

Please consider marking as "helpful", if you find this post useful. Thanks!... IT Guy since 12/2000... Virtual since 10/2006... VCAP-DCA #2222
0 Kudos
Darryl201110141
Contributor
Contributor
Jump to solution

No alarms on the cluster object.

When the hosts disconnect they log "host is not responding" and "host connection failure" events and a "host connection and power state" alarm.

0 Kudos
jdptechnc
Expert
Expert
Jump to solution

OK... so you're either running into an actual networking failure, or the agent on the server is failing for some reason.

If you can run a continuous ping from one of the hosts to vCenter, and do not get any dropped packets when the hosts disconnect from vCenter, it's probably a management agent issue.

Try the following for troubleshooting the hostd service:

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=100284...

Please consider marking as "helpful", if you find this post useful. Thanks!... IT Guy since 12/2000... Virtual since 10/2006... VCAP-DCA #2222
0 Kudos
Darryl201110141
Contributor
Contributor
Jump to solution

I ran a continuous ping from vCenter server to host for several minutes with no dropped packets.  Ping from the host to the vCenter server does not work but that is the same on the hosts that have no issues.

I went through the hostd troubleshooting document but didn't find any issues.

0 Kudos
jdptechnc
Expert
Expert
Jump to solution

Are you booting to SAN? Could be a storage issue if so...

Anything interesting in your latest vpxd log on your vCenter server?  C:\ProgramData\VMware\VMware VirtualCenter\Logs (assuming Windows 2008)

Please consider marking as "helpful", if you find this post useful. Thanks!... IT Guy since 12/2000... Virtual since 10/2006... VCAP-DCA #2222
0 Kudos
AndySimmons
Hot Shot
Hot Shot
Jump to solution

Make sure you have port 902 TCP open, as well as UDP.

See this KB article: http://kb.vmware.com/selfservice/documentLinkInt.do?micrositeID=&popup=true&languageId=&externalID=1...

Regarding port 902 TCP:

vCenter Server system uses to send data to managed hosts. This port must not be blocked by firewalls between the server and the hosts or between hosts

If the problem persists, I would try removing the host(s) from the vCenter inventory, and re-adding it. That'll reinstall the vCenter agents on the hosts.

EDIT: That particular exerpt was taken from the vCenter 5.x entry, but the same is true for vCenter 4.x (it's further up the table). To be clear, you need 902 TCP and UDP open both from vCenter to the hosts, from the hosts to vCenter, and from host to host.

-Andy VCAP5-DCA, VCP-DV 4/5, MCSE, space camp graduate.
Darryl201110141
Contributor
Contributor
Jump to solution

All the hosts boot from either local disk or internal USB stick.

I reconnected the host, waited for it to disconnect and grabbed the vpxd.log entries for that time period but I don't see anything that really stands out as a "holy crap it failed" message. Smiley Happy  Having said that; this is the first time I've looked in the log file so I don't really know what I'm looking for.

0 Kudos
Darryl201110141
Contributor
Contributor
Jump to solution

902 UDP is open bi-directional between host and vCenter but 902 TCP looks like it is only open from vCenter to host and not the other way.  Maybe that's the problem.

0 Kudos
Darryl201110141
Contributor
Contributor
Jump to solution

Yup, opening 902 TCP both ways did the trick, thanks!

DC

0 Kudos
AndySimmons
Hot Shot
Hot Shot
Jump to solution

Great!

-Andy VCAP5-DCA, VCP-DV 4/5, MCSE, space camp graduate.
0 Kudos