Hello, would anybody happen to have any guidance or a proven config utilizing the PERC H730p/LSI 3108/Invader controller (FW 25.2.1.0037) in pass-thru with VSAN (ESXi 5.5 build 2143827). We are having stability issues that are exhibited via PSOD and intermittent permanent disk failures on a VSAN platform build based on the above in Dell R730 chassis with Fusion-io ioScale fronted Seagate 10k v7 ST1200MM0007 disk groups.
Common log events include “firmware in fault state” for the HBA and resets and aborts for the individual disks. Errors increment in the individual drive counters correlating with these events.
We have tried different HBA drivers, from the inbox mr3 (0.255.03.01-2) to the latest known PERC9 driver (6.901.55.00.1 - currently evaluating), including some of the mr3/megaraid drivers in between (6.605.10.00-1, 06.803.52.00, 06.803.73.00). The fallback of RAID0 has passed tests so far, but we all know what that means.
We know this configuration is not currently listed on the HCL. We do have cases currently open with VMware and Dell, and are in communication with LSI.
Any guidance would be greatly appreciated.
Hello, would anybody happen to have any guidance or a proven config utilizing the PERC H730p/LSI 3108/Invader controller (FW 25.2.1.0037) in pass-thru with VSAN (ESXi 5.5 build 2143827). We are having stability issues that are exhibited via PSOD and intermittent permanent disk failures on a VSAN platform build based on the above in Dell R730 chassis with Fusion-io ioScale fronted Seagate 10k v7 ST1200MM0007 disk groups.
Common log events include “firmware in fault state” for the HBA and resets and aborts for the individual disks. Errors increment in the individual drive counters correlating with these events.
We have tried different HBA drivers, from the inbox mr3 (0.255.03.01-2) to the latest known PERC9 driver (6.901.55.00.1 - what we’re currently evaluating), including some of the mr3/megaraid drivers in between (6.605.10.00-1, 06.803.52.00, 06.803.73.00). The fallback of RAID0 has passed tests so far, but we all know what that means.
We know this configuration is not currently listed on the HCL. We do have cases currently open with VMware and Dell, and are in communication with LSI.
Any guidance would be greatly appreciated.
Do you have SATA or SAS disks?
Sorry for the multiple threads, all. This was originally posted via the developer forum and I received a message stating the tread was deleted. I didn't realize the posts were appearing here. Can be deleted or combined with the other 2 similar threads.
The Seagate 10k v7 ST1200MM0007 are 1.2TB SAS.
How much is the normal latency for wr and rd that we have on those disk in VSAN using passthru?
I haven't had any luck setting up H730p controllers up in pass through mode at all. Everything looks fine and seems to run well on initial setup but it always ended up PSOds, High latency, and false permanent failures on disks. I tried for several weeks tearing down and resetting up the vSAN cluster, setting up the controller in HBA mode, setting the controller in RAID mode with each drive configured as Non-RAID, etc.. I finally gave up and setup each as individualy as RAID 0 and specified the SSDs in ESXi. I've been running that setup for a couple of weeks now without issue.
I know LSI is having problems with pass through mode even with their supported controllers so I wouldn't be surprised if it's tied to that in some way. When they fix those issues or the H730p I'm going to revisit trying Pass-through mode again.
Thanks for the reply! The description of what you have tried helps validate what we're going through. While unlikely a fix or temporary workaround, have you also attempted to run pass-thru with the 6.901.55.00.1 driver or something other than the inbox mr3 driver? By default the inbox mr3 drivers will take precedence; I missed that initially.
I tried falling back on the old linux shim driver with http://www.virtuallyghetto.com/2013/11/esxi-55-introduces-new-native-device.html
esxcli system module set --enabled=false --module=lsi_mr3
esxcli system module set --enabled=false --module=lsi_msgpt3
but it was too old to recognize these newer cards.
Other than that I haven't tried any other drivers.
I was running into similar issues with the PERC H730p. For me it turned out to be the way ESXi was trying to reset the controller. When the VM, who owns the controller via passthru, abruptly resets the host needs to send a reset to the device and apparently the default 'd3d0' puts the PERC into bad unrecoverable state (without a host reboot anyhow).. So in short I told ESXi to use a different method of reset. Take a peek in /etc/vmware/passthru.map. Make an entry for the controller and use the 'link' method for reset. After making the modification, go back to the shell and run 'auto-backup.sh' and reboot the host.
Snippet from /etc/vmware/passthru.map
# passthrough attributes for devices
# file format: vendor-id device-id resetMethod fptShareable
# vendor/device id: xxxx (in hex) (ffff can be used for wildchar match)
# reset methods: flr, d3d0, link, bridge, default
# fptShareable: true/default, false
.
.
.
# LSI Logic / Symbios Logic MegaRAID SAS-3 3108 [Invader] (rev 02)
1000 005d link default
I'm also using the PERC in HBA mode, with the controller in RAID mode with each drive configured as Non-RAID so I can control the disks independently.
Hi
Btw , i guess you know that VMware doesn't yet support this card for VSAN ..
/P
Any updates on this?
Will the H730P ever be on the HCL?
LSI will be in my office Monday and I'll ask but i wouldn't hold your breath.
Here's a few reasons.
1. VSAN HCL testing is a lot more rigorous now.
2. A LOT of controllers that you can enable pass through mode on are NOT supported by LSI in this mode. Espect firmware crashes, and dataloss if you try.
Here is LSI's statement on this (SuperMicro's 2308 despite being on the HCL for pass through never should be used).
The LSI controllers available through distribution channels which support Pass-Through (JBOD) include the following (with those in BOLD REDindicating presence on the VMWare VSAN HCL). Note that there are other “LSI” Branded controllers listed on the HCL supporting Pass-through that are not available through distribution channels, meaning they are OEM only despite the “LSI” name and should be addressed to the OEM marketing it for support related questions:
· 9211-4i (on VSAN HCL)
· 9207-4i4e (on VSAN HCL)
· 9212-4i4e (on VSAN HCL)
· 9207-8i (on VSAN HCL)
· 9211-8i (on VSAN HCL) (I understand the Dell H200 is closely aligned with this).
· 9200-8e
· 9207-8e
· 9201-16i (on VSAN HCL)
· 9201-16e
· 9206-16e
Trying it on anything that isn't on this list and you may expect data loss, crashing, and a desire to beg Adaptec to make a decent pass through HBA.
Hi,
I would like to know why do you want to use pass-thru instead of raid0?
Do you have SAS disks?
Txs
Ezequiel
pass-through mode allows ESXi to communicate directly to disk without being interpreted by the controller.
There are management benefits such as not having to configure SSDs manually and with drive failures a simple swap of drives is easily done. Where as with RAID-0 you will have to tag your SSDs manually and if there is a failure manual interaction with the RAID controller to create a new RAID-0 set may be required.
Depending on your server configuration with RAID-0 you may be able to make these changes through a DRAC \ iLO \ etc.. or it may require a reboot to get into the controller options. You may want to instruct another employee to swap the hard drive with orange light while you're away and not want to worry about them getting into the controller interface.
Performance wise there shouldn't be much or any difference but the management benefits can be understandably important to some people.
Got it,
We have both scenarios , LSI 3008 in pass-thru and 3108 in raid0
We are using SATA disk so we are getting 32 of QLEN per Disk versus 128 of QLEN on the raid0.
For the management in the raid0 we use STORCLI , that allows us to configured physical disk on the fly with no need of restarting servers
Txs
Ezequiel
Good news Perc H730 has been added to HCL going to start testing with the new firmware
Firmware Version | Type | Features | ||
---|---|---|---|---|
ESXi 5.5 U2 | megaraid_perc9 version 6.901.55.00.1vmw | 25.2.1.0036 |
Drewdem,
When you say new firmware for the PERC 730, can you clarify? Is there a beta firmware that you are using that can be downloaded?
Also, any luck with the passthrough?
Thx.
Didn't see this when it was posted unfortunately.
What I was referring to at the time was actually the driver linked in my post.
Anybody out there able to get h730 series raid cards working well in vSphere 6.0 under passthrough? I have a development cluster that went 20 days no issue under raid0 config, but under pass-through/HBA mode, we have hosts randomly PSODing after about 7-10 days. PSOD errors come back with "Megaraid_SAS hardware critical error returning failed". VMWare HCL recommends firmware 25.2.1.0037 and Inbox driver, but that firmware/driver combo isn't even detecting my disks.
I've tried the following firmware & drivers below. After PSOD a restart fixes it but still obviously a problem to have systems PSOD.
Firmware: 25.2.1.0037, 25.2.2.004
Drivers: Inbox, megaraid_perc9 version 6.901.55.00.1vmw, megaraid-perc9 version 6.901.57.00-1OEM, lsi-mr3 version 6.606.12.00-1OEM
I've seen at least one PSOD on each of the driver versions above except for Inbox. I can't get Inbox driver working because it doesn't even detect the drives I have plugged in on HBA mode. I may have to rebuild with raid0, just seeing if anyone else has a success story with Dell h730 series raid controllers and HBA/pass-through mode.