VMware Cloud Community
JasonBurrell
Enthusiast
Enthusiast

Removing a 150GB snapshot

I have a machine that had a 150gb snapshot. My DBA deleted the snapshot and it came up with the error "operation timed out." My DBA tried again to delete the snapshot which I believe spawned another process of the snapshot delete. From what I have read "operation timed out" is a common error that is misleading because the process did not time out, it is still running. Now I have a server with 8 CPU running at 100%. Has anyone ever ran into this? Any ideas on what will happen; will I have a corrupt vmdk file? Can I kill the second snapshot delete process? Any help would greatly appreciated.

0 Kudos
5 Replies
RParker
Immortal
Immortal

Yes the ESX host has a pre-set time out, of like 120 seconds (I think) to wait for a snapshot commit to finish. If it doesn't, then you get the error. If your DBA did it again, he should have received another message that the operation could not be complete, and not spawn another attempt.

Since it's so big, I would let it go.. Check the server where the VM is and login with Putty and see if you can see the snapshot gone. If so then everything should be fine, but a snapshot of this size may take several hours to complete. Just let it go, don't try to 'force' the issue.

JasonBurrell
Enthusiast
Enthusiast

It ended up finishing. Took about 1hr. Is it possible to get the message changed for future releases? It should sit at 95% until it receives a command that it is complete not just timeout after a hard value. I'm glad to know now that if someone trys to delete a snapshot twice while one is going it does not corrupt the original disk.

0 Kudos
JasonBurrell
Enthusiast
Enthusiast

One more thing should also be fixed. When this process runs it uses 100% cpu and makes it so hostd does not respond (Host shoes disconnected state in vCenter) It should be given a lower priority so at least machines can migrate off to avoid a performance hit.

0 Kudos
RParker
Immortal
Immortal

One more thing should also be fixed. When this process runs it uses 100% cpu and makes it so hostd does not respond (Host shoes disconnected state in vCenter) It should be given a lower priority so at least machines can migrate off to avoid a performance hit.

Agree on both points. However this is a user forum, we help each other. Maybe (if we are lucky) VM Ware is watching, but not likely. You probably have to submit an SR for your problem, and then it will get documented.

There is a new release of ESX coming this year... Let's hope they address this. Believe me this is NOT the first time any of have complained.

It seems the 'snapshot' is an after thought on VM Ware part.. So I believe it was basically an add-in of sorts, and it has multiple issues, it works.. but its VERY rudimentary.

0 Kudos
Lightbulb
Virtuoso
Virtuoso

Where I work we had a couple of the "timeout" panics so we now have techs remove from cli (vmware-cmd /vmfs/volume/datastore/VM/VM.vmx removesnapshots) to prevent the alert from causing somone to improvise.

It is less than optima and would be nice if fixed.

0 Kudos