A user reported slow network I/O on a Windows server hosted on esxi 7.0u3c. While investigating, the VM locked up completely and I required an answer to a question.
"There is no more space for virtual disk '<name of disk>.vmdk'. You might be able to continue this session by freeing disk space on the relevant volume, and clicking Retry. Click Cancel to terminate this session."
I was able to determine the source of the issue was a lack of available disk space in one of the datastores.
This server was set up using thin provisioning for all datastores on version 6.5. Several upgrades later and a bug in esxi 7.0 was causing purple screens while using thin provisioning. Downgrading wasn't an option and there were no fixes available at the time. (I believe this has been fixed in 7.0u3d.) The only option was to convert all thinly provisioned disk images to thick provisioning with the intention of reversing this later.
Fast forward to today. My thick provisioned images are now causing issues due to storage constraints. Something has caused a large snapshot to be created and I cannot consolidate the disks presumably due to the lack of available free space remaing on the datastore.
I've tried:
2022-08-12T08:08:33.533Z cpu6:1049018)vmkusb: umass_attach:1123: umass_attach: Attach device cached_name NULL, cached data ff
2022-08-12T08:08:34.535Z cpu6:1049002)vmkusb: umass_watchdog:1015: umass_watchdog: Register SIM for New Device with 0 sec(s) delay
2022-08-12T08:08:34.536Z cpu7:1049005)vmkusb: umass_detach:1284: umass_detach: Device umass0 is detaching
2022-08-12T08:08:34.536Z cpu7:1049005)vmkusb: umass_detach:1300: umass_detach: Detaching umass0 with cached_name NULL, adapter name Invalid, is_reserved 0
2022-08-12T08:08:34.536Z cpu0:1049007)WARNING: ScsiPath: 9487: Adapter Invalid does not exist
2022-08-12T08:08:34.536Z cpu0:1049009)DMA: 687: DMA Engine 'vmhba35' created using mapper 'DMANull'.
2022-08-12T08:08:34.558Z cpu4:1049011)ScsiAdapter: 3418: Unregistering adapter vmhba35
2022-08-12T08:08:34.558Z cpu4:1049011)DMA: 732: DMA Engine 'vmhba35' destroyed.
Without this I'm at a loss. I don't have anymore internal disk adapters so there's no option for adding more internal storage at this time. I'm aware of the 2TB USB disk size limitation within esxi, but that doesn't seem to be the issue here.
Of course, this is a production machine so downtime counts. How can I go about getting these disks consolidated (there's plenty of room inside the base disk images)? Is there a way to get the USB storage option working long enough to expand the datastore? It's my understanding that I only need to add about 1GB to get the consolidation to complete.
Here's what I see with lsusb.
[root@esxi:~] lsusb -d 0781:5575
Bus 001 Device 005: ID 0781:5575 SanDisk Corp. Cruzer Glide
[root@esxi:~] lsusb -d 0781:5575 -v
Bus 001 Device 005: ID 0781:5575 SanDisk Corp. Cruzer Glide
Device Descriptor:
bLength 18
bDescriptorType 1
bcdUSB 2.00
bDeviceClass 0 (Defined at Interface level)
bDeviceSubClass 0
bDeviceProtocol 0
bMaxPacketSize0 64
idVendor 0x0781 SanDisk Corp.
idProduct 0x5575 Cruzer Glide
bcdDevice 1.00
iManufacturer 1 SanDisk
iProduct 2 Cruzer Glide
iSerial 3 4C530000070415215490
bNumConfigurations 1
Configuration Descriptor:
bLength 9
bDescriptorType 2
wTotalLength 0x0020
bNumInterfaces 1
bConfigurationValue 1
iConfiguration 0
bmAttributes 0x80
(Bus Powered)
MaxPower 200mA
Interface Descriptor:
bLength 9
bDescriptorType 4
bInterfaceNumber 0
bAlternateSetting 0
bNumEndpoints 2
bInterfaceClass 8 Mass Storage
bInterfaceSubClass 6 SCSI
bInterfaceProtocol 80 Bulk-Only
iInterface 0
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x81 EP 1 IN
bmAttributes 2
Transfer Type Bulk
Synch Type None
Usage Type Data
wMaxPacketSize 0x0200 1x 512 bytes
bInterval 0
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x02 EP 2 OUT
bmAttributes 2
Transfer Type Bulk
Synch Type None
Usage Type Data
wMaxPacketSize 0x0200 1x 512 bytes
bInterval 1
Device Qualifier (for other device speed):
bLength 10
bDescriptorType 6
bcdUSB 2.00
bDeviceClass 0 (Defined at Interface level)
bDeviceSubClass 0
bDeviceProtocol 0
bMaxPacketSize0 64
bNumConfigurations 1
Device Status: 0x0000
(Bus Powered)
The disk is not listed when using esxcli storage core device list.
To understand the current state, please run ls -lisa > filelist.txt in the VM's folder, and attach the filelist.txt along with the output of df -h to your next reply.
Do you have other VMs on the same datastore, which are not as important as this one, that can be shut down for some time? This will free up disk space that's in use for their swap files, which may be sufficient to successfully consolidate the snapshot.
André
Thanks, André.
I'll have to post back later with the output you have requested. I won't have access to the host until later today again.
To give your an idea what's happening, this server has 2 SSDs and 4 HDDs along with 1 M.2 SSD.
Esxi lives on the M.2 SSD.
The two SATA SSDs are in RAID 1 and they house datastore1 and datastore2. Datastore1 is used for general files (esxi patches, ISOs, etc). Datastore2 houses VMs (one small Linux install for network diagnostics, one Windows 10 workstation, and one Windows 2019 Domain Controller).
The 4 HDDs are each configured with their own individual datastore. Each one is attached to the domain controller as a separate drive which is in software RAID using Windows Storage Spaces.
SSD1/SSD2
Datastore1
Datastore2
DC.mydomain.local/
DC.mydomain.vmdk
DC.mydomain-000001.vmdk
HDD1
DC.mydomain.local/
Disk1.vmdk
HDD2
...
HDD3
DC.mydomain.local/
Disk3 vmdk
Disk3-000001.vmdk
HDD4
...
This issue is with disk 3. The volume takes up the virtually the entire physical disk. The disk image is thick provisioned. The snapshot image has reached around 6.4GB and there's no longer enough space on the disk to perform the consolidation.
I should also add that I was able to finally get USB drives recognized. I must have made a typo the first few times, but I finally got usbarbitrator disabled and they now appear in storage disks. My thought was to format one with vmfs and use it to create an extant for Disk3's datastore.
>>> My thought was to format one with vmfs and use it to create an extant for Disk3's datastore.
I strongly recommend against this. Please note that USB devices are not supported as VMFS datastores (even if you get them to work), and that there's no way to remove an extent anymore without reformatting the datastore!
>>> ... the snapshot image has reached around 6.4GB
Is it really GB, or is it TB?
If you cannot consolidate the snapshot online, you may consider to bite the bullet, and schedule some downtime to try, and delete the snapshot.
André
>>> Please note that USB devices are not supported as VMFS datastores (even if you get them to work), and that there's no way to remove an extent anymore without reformatting the datastore!
I've taken this into consideration. This disk only houses an "attached" disk image. If I can get the disk image consolidated, I can move it to another disk and recreate the datastore and then move it back. (It's also a data disk that's part of a Storage Spaces disk array. It *should* be recreated by the Windows Server if it gets completely destroyed. But not knowing what precisely is staged for consolidation I'm not immediately comfortable just nuking and rebuilding the disk despite the fact that it should be okay to do so.
>>> Is it really GB, or is it TB?
Yep! It's really GB. Sad, right?
>>> If you cannot consolidate the snapshot online, you may consider to bite the bullet, and schedule some downtime to try, and delete the snapshot.
The VM is powered off. The delete indicates that it completes successfully, but the snapshot's delta files are not removed and the chain indicates they are still in use.
Here's the requested information.
[root@esxi:~] df -h
Filesystem Size Used Available Use% Mounted on
VMFS-6 111.8G 12.5G 99.2G 11% /vmfs/volumes/datastore1
VMFS-6 893.0G 893.0G 8.0M 100% /vmfs/volumes/datastore2
VMFS-6 5.5T 5.5T 6.7G 100% /vmfs/volumes/Disk1_T8WEQDLP
VMFS-6 5.5T 5.5T 6.7G 100% /vmfs/volumes/Disk2_T8WEQ26A
VMFS-6 5.5T 5.5T 6.7G 100% /vmfs/volumes/Disk4a_T6NEMH9R
VMFS-6 5.5T 5.5T 0.0B 100% /vmfs/volumes/Disk3a_T9WEKUZ8
VFFS 6.2G 3.4G 2.8G 54% /vmfs/volumes/OSDATA-6193f710-05a1744c-4a23-7c8ae1c668da
vfat 499.7M 173.7M 326.1M 35% /vmfs/volumes/BOOTBANK1
vfat 499.7M 203.4M 296.4M 41% /vmfs/volumes/BOOTBANK2
Disks 1 -4 are part of a striped array. They *should* all be roughly the same size. For whatever reason, Disk 3a is the only one with a delta file and it's enough to fill the physical disk space.
Can I consolidate and convert this to thin provisioning while migrating with vMotion? This standalone host, but it does have the Essentials license. I'm thinking I could set up another host and migrate this VM to it with vMotion and them migrate it back. Seems like overkill, but it could be a means to an end. I've never used vMotion before so I'm not completely sure of its capabilities.
I could be wrong, but for me it rather looks like an issue with datastore2 rather than datastore3.
According to the files' time stamps, and sizes, it seems that the consolidation starts on the thin provisioned virtual disk on datasatore2 (same time stamp "Aug 12 03:41" for the flat, and sesparse files), but does not succeed due to the lack of free disk space, and subsequently stops the consolidation process. The virtual disk on datatore3 does not even seem to be touched.
André
I'm not sure where to go from here. That datastore has free space available. I even deleted all the other VMs. Is there any kind of logging? I couldn't really find anything.
According to the df command, it's almost full (8.8MB available).
Is there a chance to temporarily add an additional SSD/HDD (>=1TB) that could be used to manually clone the virtual disk on datastore 2?
What I'm think ing of is to evacuate datastore1 + 2 (backup required files), then delete both datastores, and create a single, larger datastore on the SSD RAID, to which the cloned virtual disk, and the backed up files could be migrated back.
André
I'm following along. I don't have any way to add another SSD/HDD. There are no additional SATA ports available. My only current option for expansion is USB.
I'm looking at the possibility of adding some PCIe SATA expansion cards, but I'm having a difficult time narrowing down compatible hardware.
Datastore1 is mainly used for oddball storage. Mostly esxi patches and ISOs. I could probably offload those files and delete datastore1. Datastore2 could be expanded to take up the returned space.
I'm also looking at the option of adding some NAS storage, but I can't find any good information about support within esxi and looking through menus, I didn't see any obvious way to mount it.
>>> According to the df command, it's almost full (8.8MB available).
The datastares are all sized to fill all available physical disk space. The df command would show that even if the datastores were empty, wouldn't it?
Sorry for the delay, but getting parts proved to be rather interesting and I had a backorder. I've added 3 12TB disks to the system.
Will I be able to move the VMs before they are consolidated?
I'm thinking I could create another datastore and move the affected VMs to that one. I believe the disks will consolidate during the move if I'm understanding the process.
The other option involves using the extra disks as extents, performing the disk consolidation, and then trying to get the disk images converted from thick provisioning back to thin provisioning.
I'd create another datastore, and use the Migration Wizard to migrate the VM to the larger datastore. Since you have an Essentials license (i.e. no vMotion license), this will require some downtime. However, it's likely the most secure way to resolve the current situation.
André
Silly question. How do I use the Migration Wizard? I don't see it in the web interface. Will it start if I "move" the VM folder in the Datastore Browser?
Sorry, my bad. I somehow missed that you do run the ESXi hosts as a standalone host.
The Migration Wizard is available in vCenter Server only. Any chance that you deploy vCenter Server?
André
I don't have a vCenter Server, but I do have a license. Is that a direction I need to go?
If you do have the required resources (RAM, CPU, storage), deploying a vCSA might be the easiest way to resolve the situation.
There are of course other alternatives, which however require running CLI commands, and editing the configuration file.
André