Hi all,
I have this single ESXi 5.0 server that's causing me headaches..
It loses network connectivity every 10 minutes or so..
It shows up as Not Responding in the vCenter Server, trying to connect directly via the vSphere Client gets "...the host may not be available on the network..." or something like it, and connecting remotely (Putty) shows me the login window but refuse to authenticate (placing the right credentials of course).
The only solution is physically connecting to the host and restarting the Management Network.
Then all works fine.. for 10 minutes!
I can't seem to find anything useful in the logs.
I've also checked this KB: http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=101908... but with no luck.
Help?
Hi,
What's your network setup like? How is everything connected? Portgroups, management, vMotion....
These two KB's might give you some pointers in where to look regarding networking issues:
The network configuration is nothing out of the ordinary-
I have another ESXi in the cluster along the problematic one that experiences no issues.
Both these hosts resides on the same Blade System (HP) connected to the same physical switch with the same network configuration (different IP's of course). The same VLAN, the same Teaming configuration, etc.
And I've checked and the physical NICs on the problematic host seems to be OK.
Probably they are, but I'll ask anyway... Are both blades on the same firmware level (especially the mezzanines for example)?
Did this issue start recently? As in, was the host working fine first and it started later?
Anything interesting in the logs when the issues occur?
I'm not sure about the firmware level- I don't think they are the same as the hosts are from different series..
And yes, this started a few days ago, before that all worked fine-
But I do not recall any changes being made in the configuration between then and now.
The logs are not much of a help either 😕
Make sure the IP you have set for the management interface isn't in conflict with another host on your network. The behavior you describe is identical to what I encountered when I had an IP conflict recently.
If you are having this or some other logical or physical networking issue, the host should log it in the vmkernel log. Go to the host console UI, View System Logs, and look at the vmkernel log. That should give you a hint as to why your host is dropping.
IP conflict is one of the first thing I cheked.. That's not the issue here - sadly (it would of simplify the solution).
the vmkernel log doesn't tell me much.. there are some entries related to networking, but I don't know if it's relevant to the problem..
I'm attaching a PS of the tail of the vmkernel log..
I'm sorry, still not clear on the fact if this host was recently added to the cluster? Is it newer than the other?
What are the blade type(s)?
Looks like your link is constantly going down... up... down... up...
Do you have a diagnostic disc for your server that you can run to test your Network Card? Try a different port on your network switch?
Could you attach the log as a txt file?
Are you by any chance using broadcom interfaces? Did this occur recently or is this a fresh install? Did you install additional drivers for your NIC's?
If its an HP server Check this document to make sure you have the right drivers and Firmware versions supported
http://vibsdepot.hp.com/hpq/recipes/October2012VMwareRecipe3.0.pdf