So I have an issue that is spanning different things.
Working with vCenter 6.5 with ESXi 6.5 hosts.
I have a WIndows 2008 Server VM that is around 2.8 TB, however the data stores setup are only allocated 1.9 TB of space. It has 5 disks, 4 of them on one data store and another disk on
another data store. There are a total of 25 to 30 data stores that vCenter can see. Each data store has around 1.9 TB of space allocated.
Before I arrived, folks were creating nested snapshots and not deleting them. I've cleaned up all of the snapshots, however its having issues and I need to consolidate disks. I am not able to consolidate the disks.
I've tried to migrate VMs from one data store to another, and I'm not able to. I thought this was an option, but its not working.
There is a data store with no VMs or data on it. Am I able to delete the data store, not cause any issues with deleting it and then re allocate the space to the data store where the Windows 2008 VM lives, so I can consolidate the disks?
Will there be an issue with having datastores over 1.9 TB? I'm still not sure what the end storage is. I can't tell if it is SAN or the servers where the ESXi hosts live.
ESXi and vCenter will not have an issue with a datastore over 2TB... that was last an issue in the 5.x days...
Your array on the other hand may vary... or may need a firmware upgrade... depends on what it is and how old etc...
As far as the disk consolidation is concerned..
Can you try consolidate with the VM powered off?
This process will be faster and you will not need additional space.
If the VM is on, essentially what happens is you create and run on a new snapshot while you clean up the old ones... then you take and run on a new, snapshot and clean up the one you took at the start of consolidation... and so on till you can stun the VM and consolidate the new snapshot within a vm stun...
Depending on how much data change there is and how fast the array is... this may never happen...
The other option is to clone the disks one by one from CLI... see the article: VMware Knowledge Base
There may also be some orphaned/stale snapshots... if you need help support will happily help you investigate the snapshot chain... also see this different article: VMware Knowledge Base
Thanks,
Fouad
Yes, I have tried to power off this VM, and try to consolidate disks and it fails.
I've tried to vMotion to a different ESXi host and try to consolidate disks and it continues to fail.
I'm new to ESXi CLI, however I've worked as a Red Hat Linux Admin, so I'm not stranger getting around on the CLI. I've noticed a few of the ESXi hosts have a .lck file, tied to this VM, which means a locked file. I want to be careful as I don't want to do anymore damage as we aren't able to backup al of our VMs currently too and the last good backups were made quite some time ago.
And no I haven't tried to create a new snapshot for this VM and then delete the snapshot right away or try to consolidate the disk either. As everything I try continues to fail. I didn't know the top command for ESXi, as I wanted to watch the processes from the CLI while I tried different things. I found a ESXCLI cheat sheet and will follow those today as I continue to troubleshoot.
I will report back.
thanks
If you are thinking it is a lock on the file from your backup process the vmfsfilelockinfo tool is your friend:
Good luck and let us know when you succeed!
Thanks,
Fouad
Does it matter that 4 of the disks for this VM live on one data store and 1 other disk lives on another data store?
That shouldn't make any difference.
The consolidation is on a per VMDK basis...
Kind regards,
Fouad
Using SSH, I remoted into the ESXi host and where that VM lives, I noticed there are a ton of the following files:
ctk.vmdk
delta.vmdk
flat.vmdk
I found this thread on the VMWare Community Forums.
I'm not sure what a descriptor file is off the top of my head.
Also stated to use vmkfstools command against the .vmdk files, as I'm not even sure what I'm looking for to get these disks to consolidate.
So Virtual disks consist of typically 2 parts...
The data part VM-name-flat_1.vmdk or VM-name-delta-00001_1.vmdk
and a descriptor VM-name_1.vmdk or VM-name-00001_1.vmdk
The descriptor file is a plain text file that tells you about it
so it will have things like the data file it points to, the PID and CID (parent ID and Child ID), a hint to it's previous in chain file....
with the vmkfstools -p you are basically looking to see if any of the files are locked, i.e something else is using the file, thus blocking the consolidation
Ok,now that I've really dug into this, I don't think it is a locked file. It is a lack of space on the Data Store issue.
I tried to run consolidate disks from vSphere and watch the process from the ESXi hosts with the following command:
tail -f /var/log/hostd.log
When I look at all of the .vmdk and the .delta-vmdk files, which I believe are in KB size
vmfoo.vmdk - 732
vmfoo1.delta.vmdk - 9967767552
vmfoo2.delta.vmdk - 35655680
vmfoo3.delta.vmdk - 18878464
vmfoo4.delta.vmdk - 18878464
vmfoo5.delta.vmdk - 670000222208
vmfoo6.delta.vmdk - 18878464
vmfoo7.delta.vmdk - 2101248
vmfoo8.delta.vmdk - 18878648
vmfoo9.delta.vmdk - 2101248
vmfoo10.delta.vmdk - 2101248
vmfoo11.delta.vmdk - 2101248
vmfoo12.delta.vmdk - 2101248
vmfoo13.delta.vmdk - 2101248
vmfoo14.delta.vmdk - 2101248
vmfoo15.delta.vmdk - 2101248
vmfoo16.delta.vmdk - 2101248
The Data Store only has 2.8 TB allocated with 1.2 TB free. I have two other Data Stores that are 1.9 TB free, however I don't know if it would be worth adding that free space into this data store to make a difference or not if I were to try and consolidate disk. Won't it need x2 more space in order to complete the consolidate successfully?
EDIT: Also all of the .vmdk and .delta-vmdk files has a Lock entry at the top of the file, however from the info listed, I'm not sure what to make of it. From some of the examples I've seen online, I don't see a reference to a MAC Address, for another ESXi hosts.
thanks
so can the VM be powered off for a period of time?
if yes... then consolidation should not be a space problem...
If no, then you will need to make sure there is some free space.. x2 is the maximum, assuming every block on the disk will change during the time of the snapshot consolidation... So you need think about how much data will change on the VM... so if you can't take an outage on the VM, can you limit the data change or the running services?
Most of those files are pretty small so the should be quick... the 670000222208 file is the biggest one... and that may take some time depending on other datastore I/O
Given that the 16 disk is very small I'm wondering about your snapshot chain and it's order...
If you want to post all of the descriptor vmdks (the small files) we can check them and see if they are all in the snapshot chain...
Or if you have a VMware SR, message me the SR number and I'll give you a call and we can check the information over zoom or something....
The 18878648 files are just broken snapshots... no data... 18MB, but they may be in the chain... consolidate should clean them away...
if the consolidation starts you can monitor the task with a vim-cmd vimdvc/task_list and then looking at the task...
and you should be able to monitor progress with:
The KB with the above info appears to be broken I'll see if I can fix it sometime tomorrow...
How to monitor snapshot deletion using esxtop command (2146232)
Kind regards,
Fouad
The VM is currently powered off as it has trouble operating correctly when powered on.
However even if the VM, which is having consolidation issues, is powered off, and I try to consolidate disks, it still continues to fail. When I watched the logs from ESXi host, it stated that it didn't have enough space.
At this time, there are no snapshots tied to this VM, as we deleted all of them and then we started to have issues with consolidate disks. Tried to consolidate disks and it won't do it. I don't get a specific numeric error number, just that the is not enough space.
Sounds like you need a little more help, do you have a VMware SR? can you open one?
I'll happily jump on a zoom session with you, if you direct message me your SR number.
Kind regards,
Fouad
I'd suggest that you either follow vFouad's offer for a live session, or provide more information.
At this time, there are no snapshots tied to this VM, as we deleted all of them and then we started to have issues with consolidate disks
You may not see the snapshots anymore in the Snapshot Manager, but if a Consolidation message shows up, there are snapshots involved.
As a first step, please run ls -lisa in the VM's folders on both involved datastores, and post the command's output along with the free disk space on each of these two datastores.
André
So, I was hoping to solve this issue with the info gleaned from this thread, however we aren't there yet.
So need to say this, this VM is in an air-gapped environment, and I won't be able to show the output of some of the commands, so I'll type them out here the best I can.
On host esxi2, here is the output for
ls -lisa
/vmfs/volumes/565ca4f3-d83367b3-44e8-8cdcd414d178/foo
88 files total
On host esxi4, here is the output for
ls -lisa
/vmfs/volumes/565e5dd5-c7fcea28-f6c2-8cdcd414d178/foo
9 files total
This is the output for df -h for hosts esxi2
/vmfs/volumes/565ca4f3-d83367b3-44e8-8cdcd414d178/foo
Filesystem Size Used Avail Used
VMFS-00 1.9T 1.7T 2397G 88%
/vmfs/volumes/VMFS00
However from vSphere for this data store, show the capacity for the Data Store is 2.88 TB and shows free 1.21 TB, which is not reflected in this command.
For host esxi4, this is the output with df -h
Filesystem Size Used Avail Used
VMFS-01 2.9T 1.4T 1.5 TB 47%
/vmfs/volumes/VMFS01
If I look at this data store in vSphere, this is accurate.
This VM used to live on the host esxi2, however I vMotioned it to host esxi4 to see if that would fix it.
And again, this VM has been off during this process.
This only help a bit, and from the number of files in the two folders, one can assume that the VM has multiple snapshots.
The point why I was asking for the file listing, was to find out the size of each delta file, and the disk space consumption of each file on disk. Is it possible for you to post the file listing with renamed file names? In this case, only rename the VM's name, but not any suffixes, e.g. with a file like "VMName_1-000001-delta.vmdk" only rename the red part.
André
So we could verify that the snapshot chain is gone;
if you go to the VM via vCenter; and edit settings.
Then look to see if the option to grow the disk is available or if the disks are greyed out.
If you have the option to grow the disks then there is no snapshot chain.
Then you can safely say that the snapshots are orphaned...
If the disks cannot be changed... then more information is needed.
It may be that only some disks are impacted.... it may be all disks...
Please let us know how you want to continue, but the absence of information is making this very very difficult.
Here are all of the delta.vmdk files. I had to type these by hand, as again, this is coming from an air-gapped environment.
--rw------1 root root 2101248 apr 1 01:12 vm2_3_0000014-delta.vmdk
--rw------1 root root 2101248 apr 1 01:35 vm2_3_0000015-delta.vmdk
--rw------1 root root 2101248 apr 1 06:23 vm2_3_0000016-delta.vmdk
--rw------1 root root 2101248 apr 1 06:47 vm2_3_0000017-delta.vmdk
--rw------1 root root 2101248 mar 18 00:41 vm2_3_000008-delta.vmdk
--rw------1 root root 2101248 mar 18 05:31 vm2_3_000009-delta.vmdk
--rw------1 root root 2101248 mar 31 14:48 vm2_3_0000011-delta.vmdk
--rw------1 root root 2101248 mar 31 15:37 vm2_3_0000012-delta.vmdk
--rw------1 root root 2101248 mar 31 20:25 vm2_3_0000013-delta.vmdk
--rw------1 root root 18878464 apr 7 18:15 vm2_3_0000018-delta.vmdk
--rw------1 root root 18878464 apr 8 2019 vm2_3_000003-delta.vmdk
--rw------1 root root 18878464 aug 9 2019 vm2_3_000004-delta.vmdk
--rw------1 root root 18878464 mar 18 00:17 vm2_3_000007-delta.vmdk
--rw------1 root root 18878464 mar 31 14:48 vm2_3_0000010-delta.vmdk
--rw------1 root root 35655860 aug 8 2019 vm2_3_0000002-delta.vmdk
--rw------1 root root 52432896 aug 13 2019 vm2_3_0000005-delta.vmdk
--rw------1 root root 670000222208 mar 17 17:07 vm2_3_0000006-delta.vmdk
--rw------1 root root 9967767552 mar 17 19:29 vm2_3_000001-delta.vmdk
I've also increased the data store to 4.3 TB and its still probably not enough space to consolidate the disks as it failed. I'm thinking it may need 8 TB in order to
successfully consolidate the disks.
Again, this is a Windows 2008 Server, here is the current disk breakdown:
HD1 64 GB allocated, thin provision - not greyed out, able to add space
HD2 200 GB allocated, thin provision - not greyed out, able to add space
HD3 300 GB allocated, thin provision - not greyed out, able to add space
HD4 1 TB allocated, thin provision - greyed out, not able to add space
HD5 1 TB allocated, thin provision - greyed out, not able to add space
Ok, I see. It would certainly be a pain to type all files by hand. So let's see whether we can reduce this.
Please run ls -lisa *-flat.vmdk on both datastores, so that you have the details of all 5 virtual base disks.
The check whether the flat files have are thin, or thick provisioned, by comparing the provisioned size (in bytes), and the used disk space (the second column in kB). If the used disk space in kB matches the provisioned size (you need to divide the displayed size by 1024), then the disks are thick provisioned, and - if all virtual disks are thick provisioned - you will need no additional free disk space, if you delete the Snapshots from the Snapshot Manager using the "Delete All" option while the VM is powered off.
If the used disk space is less than the provisioned size, then you the flat.vmdk file may grow up to its provisioned size, which means that you will need temporary disk space. This can be anything between zero, and the difference between the provisioned size minus used disk space.
If all of the VM's virtual disks are thick provisioned, and you don't see a snapshot in the Snapshot Manager, you may simply create one just to enable the "Delete All" option.
André
Thanks for the response. Here is the output:
from esxi host2
ls -lisa *-flat.vmdk
(provisioned size) (used disk size)
432046212 785289520 -rw------ 1 root root 1099511027776 Mar 30 19:19 vmfoo_2_flat.vmdk (used space is greater then provisioned space)
from esxi host 4
ls -lisa *-flat.vmdk
16810116 51030016 -rw------ 1 root root 68719476736 Apr 7 21:56 vmfoo_flat.vmdk (used space is greater then provisioned space)
25198724 209693696 -rw------ 1 root root 214748364800 Apr 7 19:52 vmfoo_1_flat.vmdk (used space is greater then provisioned space)
33587332 257932288 -rw------ 1 root root 32212254700 Apr 7 19:55 vmfoo_2_flat.vmdk (used space is greater then provisioned space)
41975940 506316800 -rw------ 1 root root 1099511627776 Aug 27 2019 vmfoo_3_flat.vmdk (used space is greater then provisioned space)
121667716 32768 -rw------1 root root 1073741824 Dec 4 2015 vmfoo_4_flat.vmdk (provisioned space is greater then used space)
To me, if I'm reading this correctly, the provisioned size is the number on the left and the used disk space is number to the right. For all of the flat.vmdk the used disk space is greater then the provisioned size, with the exception of the last disks vmfoo_4_flat.vmdk.
Also, the size from the provision size and used disk size should prove that all disks are thin provisioned, not thick.
So if I'm understanding correctly, subtract the used disk space from provisioned space and this will tell me how much space I will need to grow the data store in order to consolidate disks, correct?
thanks