VMware Cloud Community
zookie
Contributor
Contributor

ESXi, SAN, SQL2005, VM Unusable Every Hour

I would really love to get some help on this issue.

I have a Dell PowerEdge 2950 III running ESXi connected to a Dell MD3000i (15k SAS drives) SAN using iSCSI. I have a couple very low use VMs running on this machine and I just P2V a database server over the weekend. The database server runs MS SQL 2005.

During the day, hourly, I believe it is the SQL transaction backup job (confirming at 11:00am) that just causes this server to basically be unavailable to the network. It even drops ICMP packets when I ping. It only takes a minute or three before the problem conditions go away, but it is far too long for this production server. We have scanning guns that have to have real time connectivity to this database server and the loss of connectivity causes big problems. If I RDP into the VM during the problem time, you cannot click icons or open anything during the problem window. It's as if the VM just hangs. My guess is excessive disk. I have perfmon running now. I have attached a screenshot of the disk usage from the VI client during one of the times when the problem occurs.

The question is going to be what direction or steps to take to resolve. How can I improve the speed of the iSCSI connection between this server and the SAN? I have seen some debate on whether or not to use RDM. Should I look into that?

0 Kudos
12 Replies
Jackobli
Virtuoso
Virtuoso

iSCSI = Ethernet? Or do you have iSCSI hardware on ESXi - side?

If Ethernet, what type of nic, how many of them and connected to what kind of switch?

0 Kudos
malaysiavm
Expert
Expert

The screen shot doesn't show high utilization of your VM. network lost or VM not responding, might due to resources scheduling issues on memory and CPU. you can try to set the priority for CPU and memory to high and remove the reservation. Another way round, you can try resource pool and reserve the right resources, hope this will help

Craig

Malaysia VMware Communities -

Craig vExpert 2009 & 2010 Netapp NCIE, NCDA 8.0.1 Malaysia VMware Communities - http://www.malaysiavm.com
0 Kudos
Dave_Mishchenko
Immortal
Immortal

Have you tried changing the schedule of transaction log backup (for example to run at 15 past the hour) so ensure that the problem is related to that?

0 Kudos
zookie
Contributor
Contributor

Good questions.

iSCSI is ethernet based. It is running on a very basic Dell 1208 gigabit switch that is mostly used only for SAN communication. I say mostly because there are two other network cables plugged into it running somwhere that I need to track down. I am using the default integrated Broadcom NIC that come on the PowerEdge 2950 III servers for to connect to the Dell 1208 switch. There are two ports on these integrated cards. One goes to our corporate network and there are two workstation level VMs on this server that share that NIC. The other is connected to the Dell switch. The SAN has one cable from each controller card also running to this switch. Then, I have a 4 port Intel Gigabit card that we added. We are only using one port on this NIC and it is dedicated to the VM we are having problems with and connected to our corporate network. So, there is only one network cable running to the SAN switch from this server. Can ESXi do NIC teaming or equivalent?

There are two other PowerEdge 2950 III running ESXi connected to this SAN through the same switch. Both machines each only run a single VM for terminal services.

I can try changing the priority as you mentioned. I could not tell if the values seen in the screen shot are high or not. When the hourly SQL transaction log backup is disabled, this problem goes away entirely and that graph does not spike, for as long, at every hourly. It does still periodically spike, in fact it spikes higher than the hourly job, but they are very quick spikes, and not a minute or two long like the hourly one.

The problem is definitely these hourly transaction log backups and some type of load that it is putting on the system, whether thats RAM, CPU, or Disk. If I run Perfmon, I do not see excessively high CPU and RAM usage, but the disk usage "seems" high although I'm not sure what really constitutes "high". So it is an easily reproduceable condition.

I will try setting the priority higher. Any other thoughts I should think about? Is there a way to either do more tests with the disk activity/need, or increase the connection for the disks. HBA adapters or something maybe?

0 Kudos
RParker
Immortal
Immortal

Is your VM switch sharing the same switch as your console? If it is you may consider segregating the VM traffic, because otherwise there should be no relation. High network usage for a VM shouldn't affect the console connectivity.






!http://communities.vmware.com/servlet/JiveServlet/downloadImage/5441/VMW_vExpert_Q109_200px.jpg|height=50|width=100|src=http://communities.vmware.com/servlet/JiveServlet/downloadImage/5441/VMW_vExpert_Q109_200px.jpg !

0 Kudos
Jackobli
Virtuoso
Virtuoso

This means, you are using one GBit link for iSCSI to your Dell MD3000i.

As I don't know that SAN, I assume, it uses only one link (kind of zoning) for serving your requests.

The performance of the ESXi Software iSCSI has been discussed. There are some complaints about the throughput/speed of it.

If there is enough local diskspace, you could try to move the guest from SAN to local disk for any comparison.

0 Kudos
zookie
Contributor
Contributor

I'm not sure I understand so i'll tell you how our connections are setup.

We have three Cisco 10/100 switches that are used for our corporate network. One of the integrated Broadcom NICs on this server plug into this stack and one of the Intel NICs on this server plug into this stack. This provides the standard network access for the VMs running on this server. The Intel NIC is dedicated to a particular VM and it is the one we are having performance issues with.

This server has a second integrated Broadcom NIC and it is plugged into a Dell 8 port gigabit switch. The SAN has two network cables (one from each controller) also plugged into this 8 port gigabit switch.

If I look in VI client. I have setup three switches. One switch is for the SAN. Another is for the two low-use VMs running on this server that share the Broadcom NIC. A third is for the Intel NIC that I have dedicated to the VM that is having issues. All are running at 1000Mbps Full.

0 Kudos
zookie
Contributor
Contributor

Jackobli, you are exactly right. There is only one gigabit link from the server to the switch. I am trying to find out if I can team two NICs in ESXi so that I can essentially double this. The SAN, so far also only has the one NIC enabled, but I believe there are two ports on each controller so I would like to see if I can team them as well.

Your diskspace is a great suggestion that I also thought about, but I do not currently have enough local storage to try this. I would imagine it would alleviate the problem, but it would also defeat our purposes of using the SAN.

Was there any solid conclusion to whether or not the ESXi software iSCSI was that much slower than dedicated HBA adapters? These onboard NICs do have TOE enabled on them.

0 Kudos
mike_laspina
Champion
Champion

Hi,

What model of switch are you using?

If it has flow control, you should enable it.

Regards,

Mike

vExpert 2009

http://blog.laspina.ca/ vExpert 2009
0 Kudos
zookie
Contributor
Contributor

I actually mis-stated the model of the switch. it is a Dell 2708 gigabit switch. it does have some web management capabilities and it auto-negotiates flow, speed, and duplex. I will manually set the ports flow control. They are already running at 1000 Full for the other two settings.

0 Kudos
Jackobli
Virtuoso
Virtuoso

I am trying to find out if I can team two NICs in ESXi so that I can essentially double this.

Don't put to much into that. In a one to one relation (ESXi vmkernel to SAN) teaming cannot really work. But for reliability it helps.

The SAN, so far also only has the one NIC enabled, but I believe there are two ports on each controller so I would like to see if I can team them as well.

Same here, but it helps for sharing this uplink between your ESXi and the other two machines, you mentioned earlier.

Your diskspace is a great suggestion that I also thought about, but I do not currently have enough local storage to try this. I would imagine it would alleviate the problem, but it would also defeat our purposes of using the SAN.

I would only try that for testing! Local storage on ESXi is IMHO a pain (backup, maintenance and other weakness).

Was there any solid conclusion to whether or not the ESXi software iSCSI was that much slower than dedicated HBA adapters? These onboard NICs do have TOE enabled on them.

The main reason for using a dedicated HBA is (host) CPU. Software iSCSI takes load on the CPU and the VMkernel.

We all would love to hear directly from VMware about the performance of ESXi kernel. It seems :smileyalert: that there is a bottleneck in scheduling kernel ressources needed for management (lesser problem) and iSCSI/NFS VMKernel ports.

0 Kudos
zookie
Contributor
Contributor

Ok, so you are saying that NIC teaming will not solve my problem. i was hoping on that one. thank you for saving me the time though.

Do any of you have a SQL server running in a VM on a SAN. What does your setup look like? Do you have any performance issues? How did you setup your databases, etc. Did you just run SQL inside the VM and let it run, or did you use RDM for the database locations, etc.

0 Kudos