Hey,
I have a very frustrating problem with Linux VM's storage 'hanging'. We store our VM's on an EMC2 Isilon cluster accessed over NFS. The machines will frequently freeze for between 4 and 5 seconds. This affects all Linux VM's (CentOS5.5) on a particular ESXi host at the same time.
It seems to be related to the virtual disk controller, all VM's with the LSI controller exhibit the issue but a VM with the IDE one doesn't.
So far i've boiled it down to a very simple setup to recreate the issue:
1) Install ESXi 4.1U1 on a server or workstation connected to the network with a single 1gig link
2) Setup a VMkernel port for storage and management traffic
3) Setup a datastore on the isilon cluster mounted over NFS.
4) Create 2 CentOS 5.5 linux VM's with the LSI SCSI controller (they don't need network). Boot them into runlevel 1 (ie no network, minimum services).
5)
On one VM run ioping to measure latency to it's virtual disk. E.g.:
ioping -c 1000 /tmp
On the other VM, write some data to it's virtual own disk. E.g.:
dd if=/dev/zero of=/tmp/test bs=1024 count=40000
Most often when you run the dd command, a few seconds later both VM's will hang for 4-5 seconds. When it returns from hanging ioping always reports a ping time between 4000-5000ms. Both machines are frozen during this period, but the network is OK, I can still ping the ESXi host over the same link.
As I said using the IDE controllers seems to be a work around but it's not ideal. Interestingly I don't see the issue using the local disks as datastore, so it seems to be related to using NFS mounted datastores too.
I've tried updating with the latest patches using vCentre Update Manager.
Any ideas?
Nick
I'd be very curious to find out what kind of resolution you get on this. I am seeing a very similar problem on all guest OS's in my environment. We are running a Nexenta storage platform, where it see's almost identical 4000-5000ms latency spikes (over 10gbit links, nonetheless). Those spikes go away when using block storage (iSCSI/FC) or when using IDE drivers. I have found other comments on the web (http://serverfault.com/questions/285214/troubleshooting-latency-spikes-on-esxi-nfs-datastores) that seem to indicate that it is resolved in ESXi 5.0, but that really doesn't help me today. I've got an open ticket on this issue, hopefully they will have a solution that is not "move to ESXi 5.0".
That's interesting to know. I am also planning to open a case. A couple of things I've noticed the past few days:
1)
Using iometer on Win7 x64, the highest overall latency measured is nowhere near 4000-5000ms, and the OS seems responsive while the Linux VM's are hanging. This had me wondering if it's just an issue affecting Linux (CentOS5.5 in my case). Interesting that it affects all your VM's.
2)
On our production cluster of Dell R610 ESXi hosts, the virtual IDE controller also suffers the hangs, but it doesn't when setup on the test workstation mentioned in my first post. The LSI SCSI and LSI SAS controllers however hang on both test workstation and R610s. I've yet to clear down on of the R610s and rebuild it from scratch to see if that resolves the hanging with the IDE controller.
Please do let me know if VMware come back to you with any ideas.
Nick
Hi,
Find the below article says about patches updates regarding the issue u were facing.
http://kb.vmware.com/kb/1014886
That KB article does not appear to address NFS latency at all. It does address some Dell Broadcom NIC issues, but does not appear to mention NFS latency issues at all. Also, it applies to ESXi 4.0, and I am currently on ESXi 4.1.
As for the Windows 7 guests, we do not have any Windows 7 guests in our environment. We have a lot of Windows 2008R2, and a lot of Linux, but no desktop OS's.
Hi BharatR
I don't see anything in that KB that refers to this issue. I am using VMware Update Manager and am and running the latest patches for ESXi4.1.0.
Do you see this problem with a single VM running one of the stat utilities referenced (ioping, fsunc-tester) against storage provided by an otherwise idle NFS store?
Hi J1mbo,
Our isilons are in use in production, so unfortunately we've not had the chance to test against an idle isilon cluster. However running one VM does seem OK. It's as soon as you launch another VM on the same ESXi host and run dd for example the hangs begin.
I have also tested using a workstation as NFS storage (which is otherwise idle), and this has no problems with 1 or 2 VM's. ioping reports higher latencies when the other VM is writing, but there is no 4-5 second hang.
Interestingly I don't see this issue with iometer in Windows7 64. It seems responsive whilst the Linux hosts are hung.
Nick
PS - great blog, lots of your tips are on now linked to on our internal wiki page!
I have run it against an idle NFS datastore, and with one VM, I do not see the pauses. As soon as I get more than one VM running on an ESXi host that's where the pauses start happening.
I've been trying to reproduce this but without success. My setup is thus:
Without dd running the ioping is stable maybe 1ms or so. Running dd at all ramps up the response times as would be expected, but mostly consistent with NFS server load and the odd jump; highest recorded was 2,700ms or so with 1,700ms occuring more frequently. I tried 8 and 64 threads on the NFS server, with no particular difference with each (although I couldn't say how many threads were being used, as the "th" line in /proc/net/rpc/nfsd seems to be broken on my test box for some reason and always shows zeros).
Re ESXi 5, I've only looked at it very briefly but I did notice a number of new configurables in the advanced/NFS section including NFS.maxqueuedepth, although it defaults to a huge number anyway.
Any pointers on reproducing the issue would be greatfully received!
Actually to add, generating a workload on a 4K-aligned XFS partition doubled the write throughput (vs 63-sector) and yielded much more consistent ioping times (ioping -q -i 0 -w 60 -S 10G):
I've seen this before, particularly with NFS shares delivered via XFS running on arrays, whereby a mixed read/write workload in unaligned guest partitions seemed to throttle the disk queue at the NFS server to 1 IO, obviously blocking disk performance to that of a single spindle in the process. I'm not convinced it's related to the problem here, but thought I'd mention it as the greatly extended response times were only present with the unaligned workload (I've run this three times, to be sure).
Here's the view of network utilisation between two runs:
So I've been running some more tests, and have found an interesting correlation.
I've run IOPing on a LSI Logic SAS connected disk (/dev/sda) and it sees the latency.
I've also set up a NFS connection to the same pool of disks (different NFS share) and created a .img file. I formatted that .img file as ext2, mounted it via a loop device, and run the IOPing against that mountpoint. When I see the latency on the LSI connected disk, I do not see the same latency on the NFS mounted disk. IOping results below.
LSI connected disk
<snip>
4096 bytes from . (ext2 /dev/sda): request=146 time=0.4 ms
NFS connected image file
<snip>
4096 bytes from . (ext2 /dev/loop1): request=147 time=0.1 ms
4096 bytes from . (ext2 /dev/loop1): request=148 time=0.1 ms
4096 bytes from . (ext2 /dev/loop1): request=149 time=0.1 ms
4096 bytes from . (ext2 /dev/loop1): request=150 time=0.1 ms
4096 bytes from . (ext2 /dev/loop1): request=151 time=0.1 ms
4096 bytes from . (ext2 /dev/loop1): request=152 time=0.1 ms
4096 bytes from . (ext2 /dev/loop1): request=153 time=0.1 ms
4096 bytes from . (ext2 /dev/loop1): request=154 time=0.1 ms
4096 bytes from . (ext2 /dev/loop1): request=155 time=0.1 ms
4096 bytes from . (ext2 /dev/loop1): request=156 time=0.1 ms
4096 bytes from . (ext2 /dev/loop1): request=157 time=0.1 ms
4096 bytes from . (ext2 /dev/loop1): request=158 time=0.1 ms
4096 bytes from . (ext2 /dev/loop1): request=159 time=0.4 ms
4096 bytes from . (ext2 /dev/loop1): request=160 time=0.1 ms
I just can't reproduce this problem for some reason.
I tested the IDE controller in one VM and the LSI controller in the other both ways, i.e. generating load on the LSI and ping-test on the IDE, and vice-versa. The results were:
I also tested movig the VMs to NOOP scheduler for these volumes, which seemed to slow things down slightly.
Not too sure what else to add here. I'd proceed by by simplifying the NFS test rig for example to a single interface, single pSwitch maybe?
Thanks for taking the time to look at this Jim. I also tried it with a workstation acting as an NFS datastore (CentOS5.5, xfs formatted, single 1TB HDD, Quad Core 2.93, 12Gb RAM). Using that I saw no issues. Maybe that's whay you can't re-create it?
I'd be interested to see if mbreitbach tested on some NFS storage that wasn't the main storage he was having issues with. (He doesnt specify what the idle NFS datastore was on he tried)
One interesting thing I found yesterday was that using an Ubuntu VM (10.04.2 Lucid) running ioping 0-5, I don't see the hangs. This VM happily keeps writing to it's disk with genereally sub 10ms response times, whilst at the same time ioping running on my CentOS5.5 VM's hang. Again i'd be interested if mbreitbach sees similar.
I guess this pins it down to something in the vmware software stack which doesn't like certain combinations of OS's with certain storage vendors.
Vague i know!
Hello ,
What's kind of equipement you have between your filer and your esx ?
We have experienced the same problem ( high latency) with our linux vms two weeks ago, and the solution was on the cisco switch. The two ethernet link was configured on etherchannel on the Netapp but not on the switch. So we recreate an etherchannel (port channel) with a trunk of two ports on the cisco and now it works fine. I think the Netapp was trying to load-balance connection but the swith refuse it so that cause retransmission of tcp packet.
It 's easy to figure out if the problem came from the link between filer and switch : shut down all the port except one and see what's happen.
We also have removed the flow control parameter ( set to none) because we suspect if to send pause frame.
Now biggest spikes are 50ms, that was 5000ms before we change the configuration.
Hope this helps.
Thanks for the suggestion but the issue remains on my test box which doesn't use any trunking or LAG groups, so I think it's something else.
Testing with a trial of ESXi5 and the problem is no longer there. Just need a fix for 4.1!
we have the same bug sometimes.
i detect that bug come when we want to create a big disk file (like 200gb) my nas serveur (hp nas x1400) overload hard drive, next my 2 esx cut connection with them, and if the cut is too long vm server will shutdown.
Hi!
I'd hate to say that it's "nice" to see others with the same problem but at least i'm not alone!
I have very similar problems and i'm running ESX5 with the latest patch.
I'll try to describe my setup quickly. First off, i have two servers, one primary and one secondary. They currently reside in the same serverroom but will be located in separate firecells with separate powersupplies as soon as i am done with them. The main concept is that if one fails the other one takes over, i have tested this and it does work.
On both ESX hosts I have a virtual centos machine with DRBD and heartbeat that syncs discspace between the two servers via a 10GB fibre and serves as a NFS server to supply ESX with NFS datastores. This works brilliantly on one of the stores i setup for the virtual systemdiscs. The problem i have with latency is on a larger datastore i setup for datavolumes. As soon as i push the datastore the writelatency kicks up to 7000ms.
It's strange because when i subject the very same discspace with the same writeoperations i have a latency of 2ms.
My conclusion: The problem is NFS - ESX. Some kind of buffer or queue gets clogged/swamped and the whole thing goes belly up.
I have come to the end of my line now and im currently about to delve into the very scary realm of "Advanced settings" for the ESX. I sure hope i dont break my hosts now....
If i find anything i will write in this thread!
=T=