Hi,
I'm running into a serious problem wrt. a FreeBSD guest under ESXi 5 (VMware vCenter Server V 5.5.0 Build 1623101 to be exact):
From time to time for no apparent reason I get the following errors on the guest:
Increasingly I'm seeing errors like these:
Dec 15 01:33:25 igue kernel: (da0:mpt0:0:0:0): WRITE(10). CDB: 2a 00 00 c0 9e 22 00 00 08 00
Dec 15 01:33:25 igue kernel: (da0:mpt0:0:0:0): CAM status: SCSI Status Error
Dec 15 01:33:25 igue kernel: (da0:mpt0:0:0:0): SCSI status: Busy
Dec 15 01:33:25 igue kernel: (da0:mpt0:0:0:0): Retrying command
Dec 15 01:33:25 igue kernel: (da0:mpt0:0:0:0): WRITE(10). CDB: 2a 00 00 c0 00 a2 00 00 08 00
Dec 15 01:33:25 igue kernel: (da0:mpt0:0:0:0): CAM status: SCSI Status Error
Dec 15 01:33:25 igue kernel: (da0:mpt0:0:0:0): SCSI status: Busy
Dec 15 01:33:25 igue kernel: (da0:mpt0:0:0:0): Retrying command
BTW, the "disk" is recognized by the OS as follows:
Dec 12 13:42:40 igue kernel: da0 at mpt0 bus 0 scbus2 target 0 lun 0
Dec 12 13:42:40 igue kernel: da0: <VMware Virtual disk 1.0> Fixed Direct Access SCSI-2 device
Dec 12 13:42:40 igue kernel: da0: 300.000MB/s transfers
Dec 12 13:42:40 igue kernel: da0: Command Queueing enabled
Dec 12 13:42:40 igue kernel: da0: 61440MB (125829120 512 byte sectors: 255H 63S/T 7832C)
Here's what I checked so far:
Even worse: if these errors appear several times in a row the VM completely crashes and has to be "power-cycled" manually
Thanks much in advance for any clue,
-vmejo
Hello vmejo
did you solve this problem?
Hi,
Sorry - no 😞
Neither have I solved this problem nor did I get any reply to my question.
Problem is still there with the VM running into the exact same problems every couple of days.
Hello, we are seeing the exact same error.
If anyone has any tips please help!
I have same problem.
10.1-RELEASE-p9
ESI 5.5.0, 2068190
Have any solutions ?
Hello everyone,
what is the SAS/SCSI/SATA virtual adapter you are using and have you tried changing it around? Think LSI Logic SAS or Parallel switch could help, maybe even trying a SATA controller for your disk storage could help.
This is solved with Freebsd update. you can see the link below
CAM status: SCSI Status Error | FreeNAS Community
FIXED in latest 9.2.1.2.
ESXi installed on Dell R730XD. Raid controller PERC H730 Mini it is LSI 3108.
Have two Logical dirve. On on SSD drive and One on SATA drive.
Both have a problem.
What about FreeBSD 10 ?
I use LSI Logic Parallel SCSI Controller for FreeBSD VM. Because it is by default.
Somebody try use LSI SAS SCSI Controller for FreeBSD VM ?
Hi,
Are you sure it's actually fixed ? I'm running 10.1 (kernel & System dating April 15) and the problem is still there, i.e. I'm still getting These Errors, to the log - only the system doesn't crash any more.
The problem is relevant!
On FreeBSD 10.1 (r274401) appears about once every 48 hours.
There are some solution
In VmWare ESXi 5.5 i running 3 FreeBSD. And all three virtual machines at the same time fall with error (CAM status: SCSI Status Error).
it happens always at night.
Have any solutions for this problem?
Also from what I've seen it happens mostly overnight. One possible explanation, at least in our context is that night time is backup time, i.e. heavy load on the storage side.
I was getting more and more of these errors on my FreeBSD VM as I added more VMs to the ESX. I've now added a memory reservation to the FreeBSD VM (Settings -> Resources -> Memory -> Reservation 3000MB) and it seems to have vastly improved (not seen any in the last 3 days). Guess it might have just been due to too much memory over subscription on the ESX impacting the performance of FreeBSD.
We're seeing the same issue but I can say with certainty it is NOT a FreeBSD issue. Working with Supermicro, LSI and VMware, we determined the LSI controller is "timing out" where all I/O comes to a complete stop on the controller. While it was the FreeBSD VM console that alerted us to the issue during our initial build-out of the server cluster, the vmkernel log file confirmed the LSI 3108 controller that backs an 8 disk SSD RAID is timing out then resetting. We've been able to cause it to "time out" at will by powering up or resetting 5 VM's at the same time. Not only does the vmkernel log display the loss of communications to the controller, the LED activity on the drives is non-existent for ~30-40 seconds.
We've tried new LSI controller firmware (even beta firmware from LSI), various VMware drivers for the controller, hardware BIOS settings for the system and the controller. You name it, we've tried it all without success.
I have an open ticket with Supermicro and VMware to solve this issue. I'll post more as I have information.
I think I may have the solution.
We just ran into this issue with a brand new Supermicro machine with an LS3108 based RAID and VMWare ESXi 6.0.
The solution was to ditch the lsi_mr3 card and use the Avago / LSI scsi-megaraid-sas driver. We were able to find the appropriate driver for our ESXI by going here: http://mycusthelp.info/LSI/_cs/AnswerDetail.aspx?inc=8447
Be sure you download what they are labeling as the "legacy driver" and not the native driver, as that is the one with the problems. Oracle has an excellent article with instructions on switching to the scsi-megaraid-sas driver and for turning off the lsi_mr3 driver, you can follow those, but reference the newer driver version / files you downloaded. Here are the Oracle instructions: Enable the megaraid_sas Driver - Oracle Server X5-2 HTML Documentation Collection
With the new driver installed I was successfully able to run the StorCLI utility (the replacement for MegaCLI) to access the card. I was able to view the current firmware and installed a newer firmware that I was able to find here: ftp://ftp.supermicro.com/driver/SAS/LSI/3108/Firmware/
After installing the latest firmware, I did have to re-add the storage for some reason. I also had problems with the web client and simply re-added the storage with the old Windows client.
I believe the key to fixing the issue is switching to the scsi-megaraid-sas driver, although I did also upgrade the firmware before performing tests that would cause the errors previously ... so I can't confirm this 100%.
I was experiencing the same issues; LaminarCS's instructions were almost enough but in my case I had to do one more thing. My environment - Dell T630, Megaraid 8380E attached to 8 1TB Samsung SSD 850 Pro drives; FreeNas 9.3 VM, and NAS4Free 10.2 VM providing iSCSI from a VMFS datastore provided by those drives I listed.
My additional problem was HEAT. The Megaraid card was idling at 98 degrees Celsius. I can only imagine what kind of temperatures it was reaching under load. Even a 20% increase would put it over the suggested thermal limit. The Dell recommended slot placement of the RAID card puts it at the top of the case where there is zero airflow. I added a fan blowing directly onto the card, and temperatures were reduced by 46 degrees C. This is a reduction from 208 to 125 F, which is enormous. Once the fan was in place, the errors ceased.
So, if you've tried all the suggestions in this thread and still are experiencing errors, check your airflow and temperatures. In my case I had to:
1.) Upgrade firmware of the Megaraid 8380E
2.) Commit 16GB to my NAS Virtual Machine
3.) Update the Megaraid driver to the newest version from VMWare
4.) Add active cooling to keep the 8380E at a reasonable operating temperature
I do not know if this will help anyone,
But I did just run into a similar issue my hardware using X79 Chipset.
In this case I resolved the issue using the sata-ahci driver in ESXi 6.5
Hope this helps
I had this error. It turned out to be the cache controller battery had failed on a HP DL380 G7
Lift the cover off your server and check the battery leds. If you have a solid amber light, then a new battery will fix the problem