VMware Cloud Community
ericdaly
Contributor
Contributor

BL460c Blade "hung" on rescan of HBA

Hi,

I had a painful issue this week when I clicked "rescan" on HBA for new LUN that I presented from EVA4400 SAN to BL460c blade. The host server seemed to drop off and become unresponsive immediatly after I rescan hba. All VM's on this host also seem to get "stuck" and HA could not release locks for other hosts to take over VM's. During this I was able to ping the ESX server but could not connect via VMware GUI. I eventually power cycled the box and after 15 mins the server rejoined the cluster.

We have 4 x BL460c blades with QLA2432 HBA's all in cluster. I am running ESX 3.5 110268 build on all host from fresh install. This seems to be similar issue reported in:

This has happend now twice so I am a bit worried. I have verified all zoning, host and LUN presentation and all setup by the book. I will test the old procedure in for 3.02 to see if this works ok:[http://kb.vmware.com/selfservice/microsites/search.do?language=en_US%26cmd=displayKC%26externalId=10229|https://exmail.pfh.ie/exchweb/bin/redir.asp?URL=http://kb.vmware.com/selfservice/microsites/search.do?language=en_US%26cmd=displayKC%26externalId=10229]

Has anyone else come across this issue with 3.5?

Regards,

Eric

0 Kudos
2 Replies
Texiwill
Leadership
Leadership

Hello,

I would do the following:

shutdown the blade. Check your BIOS settings are correct for ESX as supplied by the vendor

Reseat the riser card as well as look for anything out of the ordinary (I had a busted heat sink in one system)

update the blade BIOS to the latest levels

run memtest86 for 24-48 hours

run HP full diags for 24-48 hours

When something like this happens I have seen it be a hardware issue. But it still could be software so after you do all the above, check your vmware vmkernel log files for issues. Also check all the lights from blade to storage through your switches. Make sure there is no cable/gbic issue.


Best regards,

Edward L. Haletky

VMware Communities User Moderator

====

Author of the book 'VMWare ESX Server in the Enterprise: Planning and Securing Virtualization Servers', Copyright 2008 Pearson Education.

CIO Virtualization Blog: http://www.cio.com/blog/index/topic/168354

As well as the Virtualization Wiki at http://www.astroarch.com/wiki/index.php/Virtualization

--
Edward L. Haletky
vExpert XIV: 2009-2023,
VMTN Community Moderator
vSphere Upgrade Saga: https://www.astroarch.com/blogs
GitHub Repo: https://github.com/Texiwill
0 Kudos
BenConrad
Expert
Expert

There are (yet again) more patches for HBA 'rescan' issues:

VMware ESX 3.5, Patch ESX350-200808401-BG: Security and Other Updates to VMkernel, hostd, and Other RPMs

If you can reproduce this issue try this this patch (and any other related post 3.5 rescan patches) to see what happens. The KB article references shared IRQ and race conditions but I believe the patches fix these issues.....

Ben

0 Kudos