VMware Cloud Community
SteveEsx
Contributor
Contributor

VMware ESXi Dell PowerEdge R710 host unresponsive in vcenter during i/o

Hi

I have run a vmware cluster for a long time with PE2900 hosts and MD3000i SAN but in the last 6 months I have moved from PE2900 to the R710 dell servers and from "ESX classic enterprise" to "ESXi Enterprise Plus" on SDcard. I use Dell PC5424 switches for iscsi traffic and Cisco 2960 switches for LAN traffic. I've also started using dVs switches for everything except the iscsi kernel ports.The hosts have 4 embedded giga ports and one extra quad port intel card. All connections are redundant (i.e. one cable from embedded ports and one from quad port card for each function: vmotion, lan, iscsi and lan2).

Lately I've had a strange problem where I get hosts being unresponsive in the vcenter console when I do heavy I/O on a server like copying a vmdk file from local disk to san or vice versa or vmotion operations. The server is responsive on console and virtual machines run fine during these operations but the vcenter thinks the host is not responding. Vmotions often fail now, something that almost never happened before. I suspect the host is running fine but that the vcenter agent is just going crazy for some reason. I've tried restarting hosts but that did not help.

My next steps are going to be:

remove dVs switches and use traditional vswitch design again

reinstall esxi, remove/add to vcenter again, configure iscsi again

verify that i got all esxi patches

reformat SAN luns

test host vcenter connection stability during I/O again after changes

or worst case even reinstall the whole vcenter

Any other ideas what to do to fix this?

Thanks for any input Smiley Happy

ES

0 Kudos
2 Replies
DCjay
Enthusiast
Enthusiast

A quick question, do you use separte networks for iscsi and mamagement?

0 Kudos
SteveEsx
Contributor
Contributor

iscsi is on isolated redundant pc5424 switches (i call the subnets iscsi green and iscsi grey since md3000i prefers to have 2 subnets on the iscsi configuration). I do have vmware traffic and management on the same dvs switch "dvSwitch01vmnetwork" that uses physical NIC 0 and 5 I guess it is best practice to seperate those but I was thinking that dVs switch with 2 physical NICs would handle that well (these servers are mostly idle with less than 5% cpu load). The unresponsive hosts happens when moving data between SAN and local storage so that only uses the iscsi ports heavily, I have not seen much traffic on the other ports.

virtual switches:

vSwitch1

VMkernel iSCSI Green -> vmnic2 1000 Full

vmk1 172.28.178.*

vSwitch2

VMkernel iSCSI Grey -> vmnic6 1000 Full

vmk2 172.28.179.*

dVs switches:

dvSwitch04hosting

dvPg Hosting LAN vDs -> dvSwitch04hosting-DVUplinks-4642 (dvUplink1 - vmnic1, dvUplink2 - vmnic7, dvUplink3 - none)

virtual machines on this portgroup

dvSwitch01vmnetwork

dvPg Internal (vlan 66)

and

dvPg LAN vDs -> dvSwitch01vmnetwork-DVUplinks-4281 (dvUplink1 - vmnic0, dvUplink2 - vmnic5)

virtual machines on this port group and vmk0 service console 130.*.*.*

dvSwitch02vmotion

dvPortGroupVmotion -> dvSwitch02vmotion-DVUplinks-4284 (dvUplink1 - vmnic3, dvUplink - vmnic4)

vmk3 vmotion kernel port 192.168.168.* on this port group

I have not found a good naming standard for dVs configurations yet Smiley Happy it is a bit confusing at first compared to the old vswitch system and I don't like the vcenter dependency so I might go back to using only vswitch....

0 Kudos