VMware Cloud Community
drlektro10
Contributor
Contributor

storage devices (SATA controllers) not detected after disabling vmw_ahci

Hi all,

I've succeeded to 'bring down' my ESXI server when troubleshooting issues with 1 SATA HD. A potential solution for my issue was disabling the vmw_ahci driver, resulting in the sata_ahci driver being used after a reboot. Anyhow after disabling the vmw_ahci driver, my esxi-server does not recognize the storage devices anymore, and therefor does not see/mount any of the disks anymore. FYI: this server was running fine for 2 years with the following storage config (using the vmw_ahci driver):

 

[root@server:~] lspci -vvv |grep hba
0000:00:12.0 Mass storage controller SATA controller: Intel Corporation Device 31e3 [vmhba0]
0000:07:00.0 Mass storage controller SATA controller: ASMedia Technology Inc. ASM1062 Serial ATA Controller [vmhba1]

 

 

The Intel sata controller has 2 SSD's connected, containing the ESXI files/system and VM's, where the ASmedia controller has 2 HDD's connected as phyiscal disk to my NAS-VM (by virtualized LUN or something like that -> not a vmware specialist).

As explained in troubleshooting an issue with one of the HDD's, I decided to disable the vmw_ahci driver which resulted the current state:

 

[root@server:~] esxcli storage core path list
[root@server:~] esxcli storage core device list
[root@server:~] lspci -vvv |grep vmhba
0000:00:12.0 Mass storage controller SATA controller: Intel Corporation Device 31e3 [vmhba0]
0000:07:00.0 Mass storage controller SATA controller: ASMedia Technology Inc. ASM1062 Serial ATA Controller [vmhba1]
[root@server:~] esxicli storage filesystem list

 

 

When logging into via GUI (webinterface), I see the numbers next to 'Virtual Machines' and 'Networking' are correct, however the number next to 'Storage' is 0.

I noticed the vmw_ahci driver is not loaded during boot, even after re-enabling this with the esxcli system module .... command:

 

[root@server:~] vmkload_mod vmw_ahci
Module vmw_ahci loaded successfully
[root@server:~] esxcli system module list |grep ahci
vmw_ahci true true

 

A reboot results in the same situation, no driver loaded, and therefor storage device not detected and disks not visible.

In the vmkdevmgr-logs, I've found these entries:

 

2022-01-11T19:56:41Z vmkdevmgr[2097573]: ADD event for bus=pci addr=m00008901 id=808631e3184931e3010601.
2022-01-11T19:56:41Z vmkdevmgr[2097573]: Found driver vmw_ahci for device bus=pci addr=m00008901 id=808631e3184931e3010601.
2022-01-11T19:56:41Z vmkdevmgr[2097573]: Error loading driver vmw_ahci: Unable to load module /usr/lib/vmware/vmkmod/vmw_ahci: Failure
2022-01-11T19:56:41Z vmkdevmgr[2097573]: ADD event for bus=pci addr=m00008107 id=808631e8184931e8060100.
2022-01-11T19:56:41Z vmkdevmgr[2097573]: ADD event for bus=pci addr=m00008108 id=808631d4184931d40c0500.
2022-01-11T19:56:41Z vmkdevmgr[2097573]: ADD event for bus=pci addr=p0000:07:00.0 id=1b21061218490612010601.
2022-01-11T19:56:41Z vmkdevmgr[2097573]: Found driver vmw_ahci for device bus=pci addr=p0000:07:00.0 id=1b21061218490612010601.
2022-01-11T19:56:41Z vmkdevmgr[2097573]: Error loading driver vmw_ahci: Unable to load module /usr/lib/vmware/vmkmod/vmw_ahci: Failure

 

 

Anyone an idea how to bring my esxi-server to a working state again?

Thx!!

 

 

 

 

 

0 Kudos
2 Replies
supromon
Contributor
Contributor

 
Hello, did you solve this problem? I have the same situation.
After disabling the vmw_ahci module and then enabling it after a reboot,
the vmw_ahci module is not loaded and there is an error in the logs as in this post.
I found several similar queries on the internet but no one has a solution
0 Kudos
domboy
Contributor
Contributor

Just thought I'd chime in since I ran into this issue and also never actually found a solution.

In theory the proper way to re-enable the module and make it persistent is the following three commands:

  • esxcli system module set --enabled=true --module=vmw_ahci  -> To enable the driver
  • vmkload_mod vmw_ahci  -> To load the vmw_ahci driver
  • /sbin/auto-backup.sh -> To push the configuration in the bootbank to make it persistent over rebbot.

But I would reboot and be back in the same boat with the following two errors in /var/log/vmkdevmgr.log

 

2024-01-03T15:07:41.367Z vmkdevmgr[2097978]: vmkmod: VMKModLoad: Module vmw_ahci is disabled and cannot be loaded.
2024-01-03T15:07:41.367Z vmkdevmgr[2097978]: Error loading driver vmw_ahci: Unable to load module /usr/lib/vmware/vmkmod/vmw_ahci: Failure

 

I got support involved and they figured out what my problem was.

 

Doing an ls -lrth /

Found this:

lrwxrwxrwx 1 root root 22 Jan 3 16:04 bootbank -> /tmp/_bootbankv9c5a44p

 

Because I disabled vmw_ahci and it turned out the BOSS controller where the boot volume is used that module (which I obviously didn't realize), when ESXi would boot up it would mount bootbank on a temp partition. This temp partition does not persist on reboot, so the last command to save the configuration to bootbank gets lost.

 

So while the original disable worked, there is no way to make a re-enable persist. The only way to fix this is to re-install ESXi (or at least that is what support recommended as the way to fix it).

 

I suppose in theory it may be possible to manually create a partition (that hopefully would persist on a reboot) and re-link bootbank to it, but I haven't tested that. 

 

I also don't know if this situation would have ended differently if ESXi 7 had been a fresh install vs an upgrade from 6.7, since I've ready 7 changed how the partitions are setup.