I've opened a SR on this but I wondered if anyone else is fighting this problem.
Several VI 3.5 hosts in VC 2.5 keep going into 'not responding'. Grrrr..
If I restart vmware-vpxa on the ESX host, VC looks fine again for a while, then after 5 minutes or so this all repeats again. The CPU load on the hosts that this is happening on look ok to me. If I connect VC directly to these hosts, I see nothing wrong. The VC server isn't getting clobbered either. I've had a lot of weird VC issues since upgrading to 2.5 from 2.02. I'm thinking about nuking the VC DB & Server and do a fresh build but I rather not.
After rebooting the VC server, no hosts are showing as not responding. After being up 3-5 minutes, the same 2 hosts show not responding again. It seems to be a VC problem.
When the host shows up as not responding can you check if vpxa is running on the host. Also check to vpxa logs (/var/log/vmware/vpx/vpxa*.log) for errors.
Hi rmnick
I have the exact same issues as you describe, so I'm very much interrrested in the outcome..
The 3.5 hosts seem to work just fine... but the VC drives me nuts with these disconnections. The VC, in my case, is a fresh install
/Rubeck
Yes, it's very frustrating. I've been tailing the log and watched things after I recycled vpxa. It looked okay in VC for 3 or 4 minutes and even did 2 vmotions before it went to not responding again. I don't see anything of interest in the host logs. The log shows the agents running on his cluster buddies. I'll probably just rebuild VC and it's DB but I'll hit support a little more before I do that.
I have also noticed the same issue.. sometime disconecting and reconnecting will work but then if I try to look at the networking confog or somthing else on the host it will goto not responding.. almost like there is "lag" from the host making it "not respond"
ok I have two hosts in a cluster.. I took one out of the cluster and created a new folder in my data center then add the host that I had removed (not in a cluster).. It is responding and seems to be fine..
virtual center client seems much quicker to load as well
I know its not right but I wanted to share that with you guys... so perhaps it has somthing to do withs hosts in a cluster .. Im thinking about shutting down all VMs and removing the second host then re creating the cluster with both hosts ... but I dont wanna be in trouble come monday morning... :smileycool:
I have removed the cluster completely and both hosta are responding just fine... I have not recreated the cluster yet... but I think I will...
ok I created the cluster and dragged hy hosts into the cluster ... waiting for failures
performed 1 migration...I have viewed host configuration and network configurations.. it take a few seconds for the initial dispaly switching from host to host but all hosts continue to respond .. much better than "not responding" hope this may work for you guys as well.... esx 3.5 and 2.5virtual center server
still working fine.. I have added and removed pnics from vswitches and VC server/client is responding very well, I have also migrated some vm's between the hosts.... so Im thinking Im all set..
I've got an 11 node DRS/HA cluster and I'm running into the same problem. One node in the cluster keeps dropping out. It didn't do this right away, but over the last two days, it will drop out of VirtualCenter and the only way to get it back is to reboot the node. Once DRS moves VM's back onto it, it will drop in about 5 minutes. I put the host into maintenance mode last night so I could take a look at it today. It has not dropped out of VirtualCenter at all.
I'm looking at ESX Update 1 and it seems that there are two patches that could potentially fix this (ESX350-200802401-BG & ESX350-200803217-UG). Has anyone applied ESX Update 1 yet? Does ESX Update 1 seem to resolve this issue?
Any help would be greatly appreciated.
I havent had any problems since I deleted the cluster then recreated the cluster, back Feb. I think it was.. Im not installing the new patch/update right away, Im waiting to see how it works out for others .... seeing as Im not having any problems.
Breaking an 11 node cluster with running machines isn't really an option in this case though...
I understand, Just wondering did this happen after you updated to 3.5 and 2.5? or just out of the blue? Do you have a support contract with Vmware... any errors on the hosts console?
This is all a fresh install of 3.5/2.5, no upgrades whatsoever. What logs do I need to look at? I can post them here if you like.
not sure about the logs...At one time I had some errors on the host console, We have a support plan with vmware and they were able to connect to the host via the web using web .. whatever it is ... mental block... they were very helpful... just a suggestion... Im not experienced enough to help you, guess I got lucky when I solve the problem I was having...
No problem. I'll place a call to VMware and see what they say.
Thanks for your help though!
Start with a clean database and you will not have that issue. I did VC upgrade and a fresh install and teh clients in VC server would state they were not responding then back again but if I point the VC client to the ESX host the VM's were actually fine. The problem was the database. I did a install with a clean database and it was fine.
Sounds like the VC agent (vpxa) on the host might be having problems. The logs are under /var/log/vmware/vpx on the host. Check for errors (grep for "error]") in the vpxa*.log files.