VMware Cloud Community
starkhorn
Contributor
Contributor

SCSI status I/O error

Hi Folks,

I've just started a new role with a firm and they got a DL360 with a MSA30 disk array running ESX3.0.2. They are running 3 virtual machines and one of the first tasks that they assigned to me is to investigate why one of the virtual machines hangs when doing a find or a ls -l on a file-system. This is my first time dealing with VMware so please bear with me and my silly questions.

The virtual machine is running RedHat Enterprise and when any command tries to access the file-system mounted on /dev/sdc1, it hangs indefinitely. When I check the vmkernel and vmkwarning, I see the lots of SCSI status I/O errors (see below tail -f of vmkernel).

esxcfg-vmhbadevs shows three vmhba devices bit the vmhba2 seems to have a serious problem as showed by the esxcfg-vmhbadevs -m command. I can see in /vmfs/volumes that a volume called "raid2" points to a directory that does not exist and I know from the using the VMware virtual client that the raid2 volume was only assigned to one of the virtual machines, hence why only this machine is complaining.

In /vmfs/devices/disks, I get a "read failed" error when try to do a file on the vmhba2 disk. I've also shown the contents of /proc/vmware/scsi and also /proc/partitions.

It's clear that raid2 has a serious problem but I not too sure what caused this issue. Is there a problem with the VMware configuration or it is simply a hardware fault?

Would you recommend to remove the raid2 partition and try to recreate it? If so what is the correct procedure to do this from the CLI? I can no longer see the "raid2" storage using the VMware client tool. It's not included in the "storage" configuration section of the ESX server and also it is not included in the datastore of the virtual machine anymore either.

All of the output of the above commands is listed below. Any help would be much appreciated.

Cheers

Starkhorn

  1. tail -f /var/log/vmkernel

vmkernel: 64:21:15:46.901 cpu0:1024)<4>cciss2: cmd has CHECK CONDITION, sense key = 0x3

vmkernel: 64:21:15:46.901 cpu1:1034)SCSI: 8031: vmhba2:0:0:1 status = 0/7 0x0 0x0 0x0

vmkernel: 64:21:15:46.901 cpu1:1034)SCSI: 8135: vmhba2:0:0:1 Retry (error)

vmkernel: 64:21:15:46.901 cpu0:1024)<4>cciss2: cmd has CHECK CONDITION, sense key = 0x3

vmkernel: 64:21:15:46.902 cpu1:1034)SCSI: 8031: vmhba2:0:0:1 status = 0/7 0x0 0x0 0x0

vmkernel: 64:21:15:46.902 cpu1:1034)SCSI: 8135: vmhba2:0:0:1 Retry (error)

vmkernel: 64:21:15:46.902 cpu0:1024)<4>cciss2: cmd has CHECK CONDITION, sense key = 0x3

vmkernel: 64:21:15:46.902 cpu1:1034)SCSI: 8031: vmhba2:0:0:1 status = 0/7 0x0 0x0 0x0

vmkernel: 64:21:15:46.902 cpu1:1034)SCSI: 8135: vmhba2:0:0:1 Retry (error)

vmkernel: 64:21:15:46.902 cpu1:1034)WARNING: SCSI: 5625: status I/O error, rstatus 0xc0de08 for vmhba2:0:0. residual R 996, CR 80, ER 0

vmkernel: 64:21:15:46.902 cpu1:1034)FS3: 866: Error reading HB addr 399400: I/O error

  1. esxcfg-vmhbadevs

vmhba0:0:0 /dev/cciss/c0d0

vmhba1:0:0 /dev/cciss/c1d0

vmhba2:0:0 /dev/cciss/c2d0

  1. esxcfg-vmhbadevs -m

Skipping dir: /vmfs/volumes/47b1c575-aaf15738-9134-001cc4da44a6. Cannot open volume: /vmfs/volumes/47b1c575-aaf15738-9134-001cc4da44a6

vmhba0:0:0:3 /dev/cciss/c0d0p3 47b1c1ce-7c9d0bb8-4ebb-001cc4da44a6

vmhba1:0:0:1 /dev/cciss/c1d0p1 47b1c55e-7c1f9778-542b-001cc4da44a6

  1. esxcfg-mpath -l

Disk vmhba0:0:0 /dev/cciss/c0d0 (69973MB) has 1 paths and policy of Fixed

Local 6:0.0 vmhba0:0:0 On active preferred

Disk vmhba1:0:0 /dev/cciss/c1d0 (572195MB) has 1 paths and policy of Fixed

Local 19:4.0 vmhba1:0:0 On active preferred

Disk vmhba2:0:0 /dev/cciss/c2d0 (572195MB) has 1 paths and policy of Fixed

Local 19:5.0 vmhba2:0:0 On active preferred

  1. ls -l /vmfs/volumes/

ls: /vmfs/volumes/47b1c575-aaf15738-9134-001cc4da44a6: No such file or directory

total 9216

drwxrwxrwt 1 root root 1260 Feb 13 05:34 47b1c1ce-7c9d0bb8-4ebb-001cc4da44a6

drwxrwxrwt 1 root root 1400 Mar 15 22:26 47b1c55e-7c1f9778-542b-001cc4da44a6

lrwxr-xr-x 1 root root 35 Apr 18 01:11 raid1 -> 47b1c55e-7c1f9778-542b-001cc4da44a6

lrwxr-xr-x 1 root root 35 Apr 18 01:11 raid2 -> 47b1c575-aaf15738-9134-001cc4da44a6

lrwxr-xr-x 1 root root 35 Apr 18 01:11 storage1 -> 47b1c1ce-7c9d0bb8-4ebb-001cc4da44a6

  1. file vmhba2\:0\:0\:0

vmhba2:0:0:0: file: read failed (Input/output error).

  1. file vmhba1\:0\:0\:0

vmhba1:0:0:0: x86 boot sector

  1. pwd

/vmfs/devices/disks

  1. cat vmhba0/0\:0

Vendor: VMware Model: Virtual disk Rev: 1.0

Type: Direct-Access ANSI SCSI revision: 02

Id: Unavailable

Size: 69973 Mbytes

Queue Depth: 32

Block size: 512

Num Blocks: 143305920

Valid Partitions: 8

0: 0 143305920 0x0

1: 63 208782 0x83

2: 208845 10233405 0x83

3: 10442250 127459710 0xfb

4: 137901960 5397840 0xf

5: 137902023 1108422 0x82

6: 139010508 4080447 0x83

7: 143091018 208782 0xfc

Partition VM cmds reads KBread writes KBwritten cmdsAbrt busRst paeCmds paeCopies splitCmds splitCopies issueAvg totalAvg

0 - 2159 2159 92142 0 0 0 0 0 0 665 1330 98203 15231208

1 1024 3271 1331 31802 1940 40234 0 0 0 0 565 2646 165291240 212577670

2 1024 19342180 90190 1230857 19251990 373495536 0 0 0 0 0 0 47605374 168852747

3 - 13005269 614225 9427803 12391044 154648730 0 0 3785251 4019307 143373 377914 3170021 72988895

5 1024 18 18 108 0 0 0 0 0 0 0 0 6442 141849

6 1024 574803101 4113 16977 574798988 3171705472 0 0 0 0 0 0 63954 17209028

7 - 27 27 224 0 0 0 0 0 0 0 0 8999 148757

VM Shares cmds reads KBread writes KBwritten cmdsAbrt busRst paeCmds paeCopies splitCmds splitCopies issueAvg totalAvg active queued virtTime

1078 1000 4763893 489800 7882658 4274093 40630434 0 0 2863800 2863800 0 0 859062 34940899 0 0 755016679518928

1061 1000 6033599 102944 1014184 5930655 98225724 0 0 900592 1124682 129210 266814 5764409 112009639 0 0 755018427018928

1024 1000 596270494 101813 1403981 596168681 3558089303 0 0 0 0 12732 106920 1608028 22234588 1 0 755018750018928

Total 3000 607156025 712063 10799913 606443962 3699889972 0 0 3785251 4019307 144603 381890 1645904 23235792 1 0 755018816268928

Paths:fixed

vmhba0:0:0 on*#

Switch Path Policy: target = preferred, hba = preferred, maxcmds = 0, maxblks = 0

I/0 Paths cmds reads KBread writes KBwritten cmdsAbrt busRst paeCmds paeCopies splitCmds splitCopies issueAvg totalAvg

vmhba0:0:0+ 607156025 712063 10799913 606443962 3699889972 0 0 3785251 4019307 144603 381890 1645904 23235792

Active: 1 Queued: 0 Reserved: N Pending Reserves: 0

  1. cat vmhba1/0\:0

Vendor: VMware Model: Virtual disk Rev: 1.0

Type: Direct-Access ANSI SCSI revision: 02

Id: Unavailable

Size: 572195 Mbytes

Queue Depth: 32

Block size: 512

Num Blocks: 1171856412

Valid Partitions: 2

0: 0 1171856412 0x0

1: 128 1171845232 0xfb

Partition VM cmds reads KBread writes KBwritten cmdsAbrt busRst paeCmds paeCopies splitCmds splitCopies issueAvg totalAvg

0 - 345 343 12820 2 128 0 0 0 0 94 188 7628 10634338

1 - 147141841 204062 2486333 146937779 988161796 0 0 46423001 46510271 9181 193763 93977 661492

VM Shares cmds reads KBread writes KBwritten cmdsAbrt busRst paeCmds paeCopies splitCmds splitCopies issueAvg totalAvg active queued virtTime

1101 1000 8413749 157343 1734477 8256406 198900271 0 0 6678142 6761315 29 58 131187 1383005 0 0 197087318749486

1078 1000 176213 34069 606777 142144 5892251 0 0 152656 152656 0 0 1257730 9223943 0 0 195642359999486

1061 1000 136345101 4327 75801 136340774 758854920 0 0 39579371 39581254 248 862 18566 471805 0 0 197087291249486

1024 1000 2164414 6987 51692 2157427 23003227 0 0 0 0 2880 179991 4530826 8842350 0 0 197087047249486

Total 4000 147142186 204405 2499153 146937781 988161924 0 0 46423001 46510271 9275 193951 93977 661515 0 0 197087386749486

Paths:fixed

vmhba1:0:0 on*#

Switch Path Policy: target = preferred, hba = preferred, maxcmds = 0, maxblks = 0

I/0 Paths cmds reads KBread writes KBwritten cmdsAbrt busRst paeCmds paeCopies splitCmds splitCopies issueAvg totalAvg

vmhba1:0:0+ 147142186 204405 2499153 146937781 988161924 0 0 46423001 46510271 9275 193951 93977 661515

Active: 0 Queued: 0 Reserved: N Pending Reserves: 0

  1. cat vmhba2/0\:0

Vendor: VMware Model: Virtual disk Rev: 1.0

Type: Direct-Access ANSI SCSI revision: 02

Id: Unavailable

Size: 572195 Mbytes

Queue Depth: 32

Block size: 512

Num Blocks: 1171856412

Valid Partitions: 2

0: 0 1171856412 0x0

1: 128 1171845232 0xfb

Partition VM cmds reads KBread writes KBwritten cmdsAbrt busRst paeCmds paeCopies splitCmds splitCopies issueAvg totalAvg

0 - 813 811 42433 2 128 0 0 0 0 323 646 30155 1027587

1 1024 48096231 47723710 23878211 372521 37072501 0 0 0 0 19014 313273 996897 1720208

VM Shares cmds reads KBread writes KBwritten cmdsAbrt busRst paeCmds paeCopies splitCmds splitCopies issueAvg totalAvg active queued virtTime

1061 1000 79804 95 1185 79709 7440828 0 0 0 0 15372 71636 990559 24614808 0 0 4807819250000

1024 1000 48017240 47724426 23919459 292814 29631801 0 0 0 0 3965 242283 996891 1682145 0 0 44591218250000

Total 2000 48097044 47724521 23920645 372523 37072629 0 0 0 0 19337 313919 996880 1720196 0 0 44591286250000

Paths:fixed

vmhba2:0:0 on*#

Switch Path Policy: target = preferred, hba = preferred, maxcmds = 0, maxblks = 0

I/0 Paths cmds reads KBread writes KBwritten cmdsAbrt busRst paeCmds paeCopies splitCmds splitCopies issueAvg totalAvg

vmhba2:0:0+ 48097044 47724521 23920645 372523 37072629 0 0 0 0 19337 313919 996880 1720196

Active: 0 Queued: 0 Reserved: N Pending Reserves: 0

  1. cat /proc/partitions

major minor #blocks name rio rmerge rsect ruse wio wmerge wsect wuse running use aveq

106 0 585928206 cciss/c2d0 34 235 538 0 0 0 0 0 0 0 0

105 0 585928206 cciss/c1d0 36 253 578 0 0 0 0 0 0 0 0

104 0 71652960 cciss/c0d0 94889 249770 2561060 101820 594058482 292229960 2795596884 27822230 0 16289320 27924110

104 1 104391 cciss/c0d0p1 470 31332 63604 150 720 39513 80468 41310 0 2170 41460

104 2 5116702 cciss/c0d0p2 90190 217507 2461714 96390 19252279 74060945 747000944 11960790 0 1313830 12057240

104 3 63729855 cciss/c0d0p3 28 208 472 0 0 0 0 0 0 0 0

104 4 1 cciss/c0d0p4 0 0 0 0 0 0 0 0 0 0 0

104 5 554211 cciss/c0d0p5 18 45 216 0 0 0 0 0 0 0 0

104 6 2040223 cciss/c0d0p6 4113 198 33954 5280 574805483 218129502 2048515472 15820130 0 15729410 15825410

104 7 104391 cciss/c0d0p7 27 197 448 0 0 0 0 0 0 0 0

0 Kudos
3 Replies
Dave_Mishchenko
Immortal
Immortal

Your post has been moved to the Virtual Machine and Guest OS forum

Dave Mishchenko

VMware Communities User Moderator

0 Kudos
RParker
Immortal
Immortal

Did you try restarting the VM? Is it LVM or physical disk mount?

I think there is a driver problem in the guest, which may or may not be related to VM Ware.. Also try upgrading the VM ware tools. You can uninstall them, reboot the RhEL guest, then reinstall the tools.

0 Kudos
Texiwill
Leadership
Leadership

Hello,

Generally errors like that in the guest imply that there was an issue with the subsystem, usually this happens when there is a FC failover and it took too long to complete, etc. But Sense errors in the vmkernel log would point me in that direction.


Best regards,

Edward L. Haletky

VMware Communities User Moderator

====

Author of the book 'VMWare ESX Server in the Enterprise: Planning and Securing Virtualization Servers', Copyright 2008 Pearson Education. As well as the Virtualization Wiki at http://www.astroarch.com/wiki/index.php/Virtualization

--
Edward L. Haletky
vExpert XIV: 2009-2023,
VMTN Community Moderator
vSphere Upgrade Saga: https://www.astroarch.com/blogs
GitHub Repo: https://github.com/Texiwill
0 Kudos