VMware Cloud Community
peterdabr
Hot Shot
Hot Shot

can't access ESX4 host in 'Disconnected' state

We have ESX cluster consisting of 3 x ESX4 nodes (v. 175625) with many critical VMs running on it.

For an unknown reason (and this is the second time I'm having similar issue; first time I simply patched ESX hosts hoping that it would solve the problem), two hosts went into disconnected state in vSphere vCenter server. When I try to ssh to either one of them to restart mgmt-vmware/vmware-vpxa and further troubleshoot the problem, the ssh session window hangs indefinetely without prompting me for the password. When I use direct console, it also hangs indefinetely. I can't reboot the hosts as I would need to VMotion VMs off of it ,beforehand. Unfortunately, even thought that VMs are up and running, they are also in a Disconnected state in vCenter making it impossible to VMotion to a healthy host.

It seems to me like the only option that I have is to shut down VMs from within OS and powercycle the host. Nonetheless, I need second opinion: Is there anything else that I could do to avoid shutting down VMs and rebooting the host? Is there any other way that I don't know of to connect to ESX host or remotely restart services in order to restore host connectivity? Also, is there any other way other than from vCenter to VMotion disconnected VMs, for instance from within service console on another host with a healthy link to the vCenter?

Thank you in advance for any input on this problem.

0 Kudos
8 Replies
Chamon
Commander
Commander

When you are at the physical console at what point does the host hang? What is happening when the hosts hangs (logging in?)

0 Kudos
Datto
Expert
Expert

Do you have your Service Console memory set to 800MB or is it at the default? Also, do you have any agents from IBM, HP or Dell or elsewhere installed in your Service Console?

Datto

0 Kudos
peterdabr
Hot Shot
Hot Shot

there are no errors poping up when I'm at the physical console. I see login prompt and after typing in user/password, it hangs indefinitely.

Similar behavior to what I experienced the first time. I only wish I had exported the log files when it happened the first time, instead of relying on the thought that paching will fix the problem (as I was patching it from the first ESX4 release version 16004 at that time)

0 Kudos
peterdabr
Hot Shot
Hot Shot

Good point. It is possible that memory starvation on the Service Console could result in this behavior. All hosts in the cluster are setup with default 458MB and it would probably make sense to max it out.

Although, I don't have any third party agents installed on the Service Console and I don't know what would contribute to higher SC mem utilization (assuming that's the case). I've done mem increase to 800MB on SC on another cluster in the past, but that was to resolve issue with failing Storage VMotion.

0 Kudos
Chamon
Commander
Commander

Do you have another account that you can try and log into the host with. Doubt that will work but something to try. Then su - to root to restart the services. But, if you can not log in to the host there isn't much you can do with it. Have you removed the host from VC and then tried to re add it?

0 Kudos
peterdabr
Hot Shot
Hot Shot

Chamon,

Thanks for your reply. I will give it a try, although since VC can't communicate with the host anymore, it will not initiate vpxa removal from the host which is the case when you remove healthy host from a VC....Just my thoughts....

0 Kudos
Chamon
Commander
Commander

I am hopeing it will reinstall it. But you will most likely need a

reboot

On Oct 28, 2009, at 6:29 PM, peterdabr <communities-

Chamon
Commander
Commander

Did you have to reboot the host or did you get logged in?

0 Kudos