Hi All,
After experiencing massive issues with SAN timeouts on all our VMs with thousands of the following messages on one particular host, and a few on the others in the ESX 3.5 cluster,
Jan 9 16:53:11 cbrep085 vmkernel: 21:06:33:06.952 cpu2:1062)<6>qla24xx_abort_command(0): handle to abort=490
Jan 9 16:53:16 cbrep085 vmkernel: 21:06:33:12.025 cpu2:1062)<6>qla24xx_abort_command(0): handle to abort=573
Jan 9 16:56:26 cbrep085 vmkernel: 21:06:36:22.033 cpu2:1062)<6>qla24xx_abort_command(0): handle to abort=1639
Jan 9 16:56:59 cbrep085 vmkernel: 21:06:36:55.045 cpu2:1062)<6>qla24xx_abort_command(0): handle to abort=149
Jan 9 16:59:14 cbrep085 vmkernel: 21:06:39:10.097 cpu2:1062)<6>qla24xx_abort_command(0): handle to abort=348
Jan 9 16:59:19 cbrep085 vmkernel: 21:06:39:15.169 cpu2:1062)<6>qla24xx_abort_command(0): handle to abort=378
Jan 9 17:09:27 cbrep085 vmkernel: 21:06:49:22.377 cpu2:1062)<6>qla24xx_abort_command(0): handle to abort=823
We narrowed it down to one particular VM configured with 4 x Virtual Mapped Raw LUNs.
All of the Mapped Raw LUNs have place holders located on one of our shared LUNs. VMFS-L-Common01. There are 4 shared LUNs in total for this cluster.
What is the correct way for configuring these on the SAN side? Should all of these Mapped Raw LUNs be configured to use the same Storage Controller as VMFS-L-Common01?
Or should we split these up over the various Common LUNs whilst keeping the Mapped Raw LUNs on the same Storage Processer as the Common LUN it is placed on?
Or is this not an issue when considering the SAN side?
What should we look for in the back end of these configurations. Obviously we have a serious problem here, that seems to be specific to the Mapped Raw LUN setup.
This has happened in 2 seperate cluster / data centres/ SANs.
We have a support call with Vmware open.
Thanks
Further more here is a picture.
CBR3P111V has 4 raw disks located as placeholders on Common01.
It seems that if any of these are being hammered any guest that is also on the same Common01 lun with its own VMDK is affected. Additionally any guest that is on the same ESX host crawls to a halt.
vdf -h reponds extremely slow when trying to enumerate the LUNs
bump. problem still exists and no solution yet. Anyone else have experience?
I'd be interested with your answer, I am getting the same message qla24xx_abort_command
Hi. Have you figured out the solution? I'm seeing the same errors and its casuing hosts to momentarily lose connection to the VIC.
thanks
same issue here.