VM wont start: No swap file

mcm2009 · ‎07-05-2011

Hello,

i got a problem with a VM in my ESX 3.5i cluster.

On one of the hosts the HA agent failed and the infrastructure client was not able to communicate to the server anymore. To protect the VM running on this host, i have shut down that host from within Windows.

After that i restarted the host and the agent problem was gone as exspected.

Unfortunately it seems that the cluster is not aware of the VM being shut down even though it is shown as offline in the inventory.

If i try to start the VM i get this error message: "Could not power on VM: No swap file. Failed to power on VM". A VMSP is not present in the datastore.

I already tried to remove the swap file entry from the VMX file without any success. Migrating the VM to another host did not help either. I have not tried to clone the host.

The question is how can i get this VM back to live? Can i force it somehow to create the VMSP file?

Any help is appriated

a_p_ · ‎07-05-2011

Did you already take a look at the VM's vmware.log file. Usually you can find details on errors like this in that file.

André

mcm2009 · ‎07-05-2011

I did not see any special remarks in the log file:

Jul 05 21:23:21.032: vmx| hostCpuFeatures = 0x446000dc
Jul 05 21:23:21.032: vmx| hostNumPerfCounters = 2
Jul 05 21:23:21.068: vmx| VMMon_CreateVM: vmmon.numVCPUs=1
Jul 05 21:23:21.069: vmx| Swap file path: '/vmfs/volumes/4a5e3010-c085f62d-73e8-001a645d0266/megasrv01.freund.intra/megasrv01.freund.intra-2e4f9572.vswp'
Jul 05 21:23:21.121: vmx| Msg_Post: Error
Jul 05 21:23:21.121: vmx| [msg.vmmonVMK.creatVMFailed] Could not power on VM : No swap file.
Jul 05 21:23:21.121: vmx| [msg.monitorLoop.createVMFailed] Failed to power on VM----------------------------------------
Jul 05 21:23:21.740: vmx| Module MonitorLoop power on failed.
Jul 05 21:23:21.740: vmx| VMX_PowerOn: ModuleTable_PowerOn = 0
Jul 05 21:23:21.838: vmx| vmdbPipe_Streams Couldn't read: OVL_STATUS_EOF
Jul 05 21:23:21.838: vmx| VMX idle exit
Jul 05 21:23:21.840: vmx| Flushing VMX VMDB connections
Jul 05 21:23:21.841: vmx| IPC_exit: disconnecting all threads
Jul 05 21:23:21.841: vmx| VMX exit (0).
Jul 05 21:23:21.841: vmx| VMX has left the building: 0

But when i try to clone the server i get a message about a migration problem (prior to starting the cloning which i did not yet start):

Migration from bladesrv02 to bladesrv06: the host is reporting errors in its attempts to provide ha support.

a_p_ · ‎07-05-2011

Did you already see http://kb.vmware.com/kb/1003742 ?

This KB lists the steps to troubleshoot this issue.

André

mcm2009 · ‎07-05-2011

I tried this procedure now without success:

Power off the virtual machine.
Access the ESX/ESXi service console using an SSH client.
Open the virtual machine configuration file (.vmx) in a text editor. The default location is /vmfs/volumes/<datastore_name>/<vm_name>/<vm_name>.vmx.
Remove the location of the swap file referenced in the configuration file. It should look similar to:

sched.swap.derivedName = ""
Save the file.
Rename or delete the existing swap file from the virtual machine directory.
Unregister the virtual machine and register it back for the changes to take effect. For more information, see Registering or adding a virtual machine to the inventory (1006160).
Power on the virtual machine.

At the first startup it showed the well known missing swap file (20%). At the second start an internal error was shown (95). Third start was the swap file again.

mcm2009 · ‎07-05-2011

It seems that i lost the write permission on the datastore for some unknown reason.

Since the swap file is on this datastore this might be the problem.

I tried to manually add a folder to the datastore. This failed within vcentre. The cloning failed in the first attempt as well as it was also using this datastore.

Right now i am cloning the system to another data store which seems to work.

I can browse the affected datastore and all VM relying on this one are up and running without problems.

I think, if i can resolve this issue, the VM should become available again. But how do i resolve this?!

AureusStone · ‎07-05-2011

Is it possible the LUN is full and has been set to read-only. Usually when this happens vSphere will start suspending random VMs to ensure all the other VMs keep running.

Hopefully the clone will fix your problem, otherwise you could clone the entire LUN and re-present it.

mcm2009 · ‎07-05-2011

The clone has come up normally and it is working fine (what a relief).

But still i have to clean up this state.

It seems that i can change files within the affected data store but i cannot create new files or folders. It resides on SAN and has +/- 300 GB of free space (that should be enough).

How can i check my LUN respectively how can i see if it is in read only. Tha management tool gives barely any information on this matter (besides the sizes, hosts and VMs).

All

VM wont start: No swap file