VMware Cloud Community
csledd
Contributor
Contributor

VM doesn't start, possible snapshot/consolidate issue

Host is running VMware ESXi 4.1.0.

One of my VM's decided to shut down and when I powered it on I get the following error.

"msg.hbacommon.corruptredo:The redolog of server1-000004.vmdk has been  detected to be corrupt.  The virtual machine needs to be powered on.  If the  problem still persists, you need to discard the redolog."

I immediately think someone has filled up the datastore with snapshots.  I went into the snapshot manager and selected delete all, no errors.  I look in teh datastore and obviously I have several vmdk files sill there, they did not consolidate.  I tried to make a new snapshot and delete all and same result, I just incremented my vmdk by 1.

Any suggestions to this?  My datastore has files such as:

server-000001.vmdk

server-000002.vmdk

server-000003.vmdk

server-000004.vmdk

It also has a few .log files, 3 vmx-zdump files, but no delta files or anything else that says snapshot.

0 Kudos
4 Replies
csledd
Contributor
Contributor

To add to this issue, I downloaded Veeam Backup, and when I browse the datastore I do see all the delta files there for all the extra vmdk files.

0 Kudos
a_p_
Leadership
Leadership

How much free disk space do you have on the datastore?

To see what can be done, please compress/zip the .vmdk header files, the vmware*.log files and the .vmx file and attach the archive to your next post. In addition to the files post a list of files in the VM's folder (ls -lisa from the command line).

André

0 Kudos
csledd
Contributor
Contributor

The vmdk's are much too big to upload, some are 4GB.  It appears someone had made snapshots and never deleted them.

I had to get this issue resolved quick, so what I did was attempt to clone the vmdk to a new file.  There were two partitions:  C for system, and D had the application data.  The last vmdk file for C (server-000007.vmdk) would not clone, it errored at at 2%, so I kept working my way down until I got to server-000004.vmdk.  This was the file that was running when it went down, the others after it are from snapshots I created that did not consolidate correctly.  I found that 000003 would clone but 000004 would not, which would not help since 4 was the one that had the data in it I needed.

My resolution was to use the previous nights backup image, since I use Acronis it allowed me to convert just the C partition to a new virtual machine.  I was able to successfully clone the last vmdk for the D partition where the data was and attach it to the new virtual machine.  Because of how the partitions were set up it appears there was no data loss.

I am not sure as to why this happened.  The datastore is 500gb total and it reported that there was over 380gb free.  However, if provisioned size could cause this issue then it would have ran out of space based on that, but I didn't think that should matter.  Something corrupted the last vmdk file, maybe it was just running on that snapshot too long.

0 Kudos
a_p_
Leadership
Leadership

Good to see you were able to solve this and thanks for the detailed feedback.

The vmdk's are much too big to upload, some are 4GB.

Just to clarify. What I asked for were only the descriptor .vmdk files which are only a few hundred bytes in size. These files can only be seen using the command line or e.g. WinSCP (they are hidden in the datastore browser).

André

0 Kudos