ESX vSwitch Configuration - Want to improve since ...

jftwp · ‎10-24-2008

I'm currently running v3.0.2 and will soon be upgrading to v3.5 U2. My existing servers (DL360G5) have a total of 6 network ports available: 2 onboard NICs, and 4 NICs via quad-port Intel cards. All is GigE. Each host has 2 vswitches:

'vswitch0' consists of 2 active/bonded/teamed NICs: 1 onboard and 1 PCI card port. Both are on the same VLAN. The port groups are 'Service Console' and 'Vmotion', so 2 dedicated IPs within this vswitch pretty much just for management/HA/SC/Vmotion purposes.
'vswitch1' NICs are comprised of the remaining 4 ethernet ports (1 onboard, 3 of the quad-port card). They are configured to access a separate VLAN than that of vswitch0 so the traffic is isolated in that regard.

Although this configuration has always worked fine, it can no doubt be improved. Here's what I don't like: There is only 1 service console (I'd like to have a 2nd/backup one for increased HA heartbeat), and 4 NICs to service the 15 or so VMs (on average) per host in the cluster (6 node) is overkill, based on the utilization/logging as far as I can tell.

I think I should adjust vswitch1 such that only 3 physical NICs are in it / dedicated for the VMs. This leaves me with 1 extra NIC to allocate as a 2nd SC or otherwise.

All that said, any suggestions? I am going to start upgrading late next week, removing each host from the cluster as I do so (evacuate VMs in process), and will be doing clean/new (non-upgrade) installations. Thanks for any ideas/feedback -

oh yeah -

and what about a recommended vswitch configuration for hosts that only have FOUR NICs (in a separate, lower-end cluster for development mainly)? Thanks... reading the 3.5 Server Configuration Guide / networking sections, meanwhile...!

RParker · ‎10-24-2008

active/bonded/teamed NICs: 1 onboard and 1 PCI card port.

Why are you mixing onboard with NIC? That's a bad move. First you should use SC for the intel NIC. Use the onboard for VM switches, since they aren't as high priority. I know everyone is afraid that the NIC might go bad.. in 25+ years , hundreds of thousands of computers, and several computer companies utlizing dozens of vendors, I have yet to see 1 add-in NIC card go bad, but I have seen motherbooards flake out, and when they do.. onboard components like network, RAID and video tend to go with it. But add-in boards (especially Intel) I have NEVER heard of a single NIC going bad, ever.

Although this configuration has always worked fine, it can no doubt be improved. Here's what I don't like: There is only 1 service console (I'd like to have a 2nd/backup one for

So add another NIC to the vSwitch where the SC is located. But this vSwitch should NOT have a VM Switch. You don't need 2 SC just 2 NIC's on the vSwitch for redundancy.

This leaves me with 1 extra NIC to allocate as a 2nd SC or otherwise.

So adding this NIC to the existing vSwitch will be perfect. Everything else looks good, just in my experience using the onboard NIC's isn't as good as the PCIe quad port/dual port cards for SC/Vmotion.

mcowger · ‎10-24-2008

Disagree with Parker here.

We combine the onboard with the PCI Nics into single vswitch for the SC/Vmotion, and it works great. We do this to get redundancy for both the NICs/switches, as well as driver failure (which do happen, I've personally seen them). I also have to disagree with the assertation that PCI NICs dont go bad - with over 2000 machines in our DCs, we have NICs go bad about once every 3 months, both on board and PCI (Intel).

I agree with the rest.

--Matt

--Matt VCDX #52 blog.cowger.us

Craig_Baltzer · ‎10-24-2008

The VMware guys (VMworld presentation "VMware Infrastructure 3 Networking - Design Best Practices") are saying the "1 onboard + 1 NIC" is a "best practice" for teaming. Agreed that I've never seen a server-class NIC go bad, but I've also never been able to measure a significant performance difference between a server-based onboard and a PCI-X/PCI-E NIC either...

khughes · ‎10-24-2008

Hate to say it rparker, but I'm going to have to go against you too... Even if you have never ever seen a NIC go down, you don't want to have that % chance dancing around in the breeze, better safe than sorry I've learned. Everything else that rparker said sounds fine, adding another NIC to the SC as a failsafe is just as good as adding in a second SC I think.

Sorry bud, but in my office if I've given the reason that it never has happened before never flys because knowing the luck after saying that as a reason, a month later it'll happen.

Kyle

-- Kyle "RParker wrote: I guess I was wrong, everything CAN be virtualized "

jftwp · ‎10-28-2008

Thanks all. Back in the 2.5.x days, there was a fair amount of debate as to whether a given vswitch should include physical NICs that consisted of both PCI/add-on and the onboards. But since then, with 3.x, there was talk of general improvements in the drivers for both Broadcom onboards and Intel PCI cards, such that mixing and matching (mainly for physical redundancy's sake within the vswitch) was no longer a heated topic and became much more acceptable and, apparently, even 'best practice'.

My main concern is increasing my service console connection redundancy, to avoid 'false positives' where HA has (once in our past) kicked in and killed all VMs and started them up on an alternate host. Bad, bad, bad... just because one card or port failed/fails. HA really needs to offer an option wherein if a failure is perceived by ESX and VirtualCenter, then an EMAILED ALERT goes out to the Admin(s) first, and the admin can then confirm the overall health/status if he/she so desires to have HA react that way (instead of its default panic-mode). The administrator could then, perhaps, have an option within VirtualCenter that says 'Yes, this is a valid failure of an ESX host----go ahead and restart my VMs on an alternate host within the cluster).

Meanwhile, I'm looking to increase Service Console redundancy. Perhaps I just add the extra/additional NIC to the existing vswitch0 (which is shared by Vmotion and the SC). Perhaps I create a whole new vswitch and just have a 2nd SC therein, with that single NIC port. Not sure which is better than the other. In the end though, we're talking about physical pathways, not virtual ones, where redundancy is concerned, true?

khughes · ‎10-28-2008

My main concern is increasing my service console connection redundancy, to avoid 'false positives' where HA has (once in our past) kicked in and killed all VMs and started them up on an alternate host. Bad, bad, bad... just because one card or port failed/fails. HA really needs to offer an option wherein if a failure is perceived by ESX and VirtualCenter, then an EMAILED ALERT goes out to the Admin(s) first, and the admin can then confirm the overall health/status if he/she so desires to have HA react that way (instead of its default panic-mode). The administrator could then, perhaps, have an option within VirtualCenter that says 'Yes, this is a valid failure of an ESX host----go ahead and restart my VMs on an alternate host within the cluster).
Meanwhile, I'm looking to increase Service Console redundancy. Perhaps I just add the extra/additional NIC to the existing vswitch0 (which is shared by Vmotion and the SC). Perhaps I create a whole new vswitch and just have a 2nd SC therein, with that single NIC port. Not sure which is better than the other. In the end though, we're talking about physical pathways, not virtual ones, where redundancy is concerned, true?

There is that option, not to the full extent, but you can tell incase of network loss, leave the VM's where they are until you move them off.

Kyle

-- Kyle "RParker wrote: I guess I was wrong, everything CAN be virtualized "

Steven1973 · ‎10-30-2008

All,

As expected, everyone has its own ideas and every idea has its positive and negative sides. There are numerous ways to make a system redundant. Using Multiple NIC's is a good start, but when they are all connected to one physical switch they won't do you any good if the physical switch fails.

The setup I almost always use is as follows:

Use 2 physical, 802.1q VLAN capable, switches;
Create 802.1q tagged ports for the ESX hosts;
Put every VLAN (including Service Console and VMotion) as tagged VLAN in the trunks;
Create 1 vSwitch for with all available NIC's;
Divide the NIC's over the two switches;
Create PortGroups for the VLAN's you need;
On vSwitch level use the NIC teaming option to setup active and stand-by NIC's;
On PortGroup level override the vSwitch settings for Service Console and VMotion.

In your case I would setup 1 onboard NIC and one PCI NIC to be standby for the whole vSwitch, so all available PortGroups run over the 4 remaining NIC's.

On the Service Console and VMotion PortGroups over-rule the stand-by NIC's as Active and use the other NIC's as Stand-by or even set them to be "Unused".

-- Steven

All

ESX vSwitch Configuration - Want to improve since upgrading; suggestions?