Re: iSCSI bandwidth and NetApp

exponent · ‎11-01-2007

We are preparing to replace our EMC infrastructure, which is fully fiberchannel, with NetApp and iSCSI, since NetApp currently does not fully support* FC for VMware (a bug they are working on with a supposed resolution in the spring). Our obvious concern at the beginning of this project was ensuring that sufficient iSCSI bandwidth is going to be available between our ESX servers and the NetApp filers. Here's the proposed layout; questions follow:

NetApp Filers (2) with 8xGigE copper ports each (total of 16)

These copper ports run directly back to our core, Cisco Catalyst 4506 switches (2)
Four ports from each filer are connected to each 4506 switch (example: FilerA ports 1-4, and FilerB ports 1-4 are connected to 4506switch-1; and FilerA ports 5-8 and FilerB ports 5-8 are connected to 4506switch-2)
Currently no Etherchannels/trunks are in place on those Filer/switch ports

VMware ESX servers (4) with 10xGigE (ten 1Gbps) copper ports (2 Intel Quad-Port PCI-X, 2 built-in Broadcom)

Port allocation: 2xSC (for switch redundancy), 1xVMotion, 3xVMs, 4x[iSCSI]

vSwitch0: Service Console Port (Service Console: vmnic0, vmnic1)
vSwitch1: VMkernel Port (VMotion: vmnic2)
vSwitch2: Virtual Machine Port Group (Virtual Machines: vmnic3, vmnic4, vmnic5)
vSwitch3: VMkernel Port (iSCSI), Service Console Port (Service Console iSCSI) -- (vmnic6, vmnic7, vmnic8, vmnic9)

Default configurations on all port groups
vSwitch0, vSwitch1, and vSwitch3 are all on different, isolated VLANs (.1.x, SC; .2.x, VMotion; .3.x, iSCSI)
pNICs are connected to two Catalyst 4948 switches for switch redundancy

1xSC port per switch
VMotion port in switch-1 only
1xVM port in switch-1, 2xVM ports in switch-2
2xiSCSI ports in switch-1, 2xiSCSI ports in switch-2

Catalyst 4948 switches (2) have one 8-port Etherchannel per switch connecting each back to the 4506 core switches for 8Gbps bandwidth between 4948s and 4506s (and by implication, NetApp filers)

After reading other community threads (i.e. http://communities.vmware.com/message/681688#681688), I am getting the impression that we may be constrained by bandwidth between the ESX servers and the NetApp filers. We are using Intel Quad-Port NICs as the iSCSI interfaces, not iSCSI HBAs, so there isn't any inherent multipathing, to my knowledge. On the NetApp side, it sounds like we can create an Etherchannel trunk on the Filers and the respective Catalyst 4506s ports (I know all of our switches are capable of Etherchannel, 802.3ad, etc, etc). That would appear to take care of the NetApp end of the pipe (if I understand correctly). Take note, though, that the 8 ports from each Filer are split across two 4506 switches (not stacked).

The VMware side is where the questions lie. Since our Catalyst 4948s are not stacked/stackable and we would like to maintain switch redundancy by splitting our NICs across both switches, Etherchannel doesn't look like an option. Assuming, as is the case at one of our sites where NetApp is planned to be deployed, that both Filers are active and planning to have the LUNs/disks split between them, what is the recommended configuration for maximizing/optimizing iSCSI bandwidth on the ESX servers?

We have one site that complicates the prior question, which leads to my second question. At this other site, to achieve the necessary IOPS, all disk shelves (2) will be attached to only one (1) Filer with the second Filer acting as standby with just a few inactive disks attached to it to keep the OS (Data ONTAP) running. Thus, in this situation, all of the LUNs will be on one target/destination. My understanding of the solutions for ESX iSCSI multipathing thus far are that they depend on multiple targets to achieve any semblance of multipathing. Is that the case? Regardless, what is the recommended implementation is a situation such as this?

Thanks so much.

-Chris

While VMwareFCPNetApp is supported, the main benefit of NetApp -- SnapDrive, SnapManager, Snap____ -- is lost since SnapDrive is currently incompatible with FCP.

Message was edited by: exponent

Change: added clarification on the lack of NetApp support for VMware and FCP

meistermn · ‎11-01-2007

Is there KB articel were the bug for fc is described

exponent · ‎11-01-2007

Here's the quote from NetApp when I e-mailed our sales engineer after researching it further and finding that NetAppFCPVMware is viable:

First, you are correct that NetApp is supported via FCP (as well as ISCSI and NFS) for ESX 3.x. What is NOT currently supported with FCP is *the use of SnapDrive for Windows to manage disks *that are presented over FCP to the guest OS (RDM). To use the SnapDrive functionality within the Virtual Machine, we would have the VM access ISCSI LUNs through the Microsoft ISCSI software initiator. Note that this virtual machine could still be configured with its system disk on a datastore or RDM over FCP.
VMware supports two methods of Multipathing. For fiber channel HBAs and ISCSI HBAs, you can use the VMware Multipathing which allows you to select a different preferred path for each LUN presented. With the ISCSI software initiator and NFS, VMware leverages its NIC teaming and network load balancing capabilities, which can be a bit more limiting depending on your configuration. Should you desire the first approach, we would recommend either FCP or ISCSI HBAs.

I included the second paragraph as well since it speaks in generalities to multipathing. We're still seeking more information directly from NetApp and other sources (VMTN) for practical applications of NetAppiSCSIVMware. At present, though, we may forego SnapDrive in order to maintain our existing FCP infrastructure, understanding that it should be supported in a few months.

Hope that helps.

dalepa · ‎11-02-2007

Have you considereed using NFS instead of iSCSI? NFS is not only faster on Netapp it's much easier to configure and maintain...

You need to setup etherchannel on the filer(2 vifs, 1 trunk) and cisco switch...

Also you only need two vSwitchs, one for service console and vmkernel and one for the guests.

You only need to use 4 Ge ports per host.. 2 for vswitch0 and 2 for vswitch1. Split them across each switch. Additional ports don't help with bandwidth needs, so why make it so complex.

we have over 1000vm across 2 filers using nfs... works great...

exponent · ‎11-02-2007

No offense, Dale, but I have to disagree with most of what you suggested/wrote. NFS may have "easy" on its side, but with RDMs, which is how all of our VMs are configured, it loses much of its advantage and is debatably "faster" than iSCSI or especially FCP.

As for the recommended network configuration, security/configuration best practices recommend against the combination of the Service Console with VMotion (vmkernel). Furthermore, additional ports do help with bandwidth needs both on the VM guests and iSCSI. While a single VM may only utilize the capacity of one pNIC (1Gbps), the aggregation of 15 VMs on a single host (which is the case in our situation) will utilize the additional ports for VM traffic. On top of that, have four ports dedicated to iSCSI will allow us to assign up to four vNICs to a VM guest, which can then allow the Microsoft iSCSI Initiator to create multiple tunnels or sessions across all of those ports (with a max of 4Gbps iSCSI).

After a in-depth meeting with NetApp engineers today, we were able to answer the questions posed in this thread. To avoid the multipathing limitations of guest boot volumes over software iSCSI (using our Intel Quad-Port NICs; it would not be a limitation with iSCSI HBAs), we will utilize our existing fiber channel infrastructure to offer up the boot volumes to VMs. This will achieve boot volume multipathing.

The Microsoft iSCSI Initiator will take care of iSCSI multipathing for each guest VM, without complicating things with switch/ESX Etherchannels. All non-boot volumes will be handled via iSCSI and added to VMs using NetApp's SnapManager application (which is made possible by SnapDrive, which will work via iSCSI).

Between the NetApp Filers and our core 4506 switches, we will create Etherchannels to aggregate the Filer ports on each switch.

Thus, with these combined, all points will be multipathed and redundant.

dalepa · ‎11-02-2007

Maybe if you are required to use RDMs for some reason on all of your VMs (MSCS?) then your stuck using ISCSI/FC.

We have over 1000 (and growing) VMs across 35 16-way/32GB ESX hosts all using a 3070 cluster. All over NFS running for more than a year now. No Issues.

exponent · ‎11-02-2007

This will be my last reply in regards to your implementation and recommendations. Simply put, just because something works (i.e. your configuration of NICs/vSwitches/etc) doesn't mean it is a good or secure model for others to follow. Each environment has its own requirements (for example, ours with RDMs, etc) so that may work for you, but not for us. Thanks for the input anyways.

RussH · ‎11-04-2007

Hi Chris - In terms of creating multipaths for the boot drives via iSCSI, how about adding an additional IP address to the Filers VIF and within ESX adding a datastore through each IP and splitting the boot drives for the vms across the two datastores.

With two NICS attached to the VMKernel/iSCSI VSwitch it should load balance between the two NICs, essentially creating two paths?

Never tried it so don't know if this would work in practice, just a thought.