VMware Cloud Community
mreferre
Champion
Champion

Manual HA ... what are the steps?

I have a situation where the customer (for a number of reasons) did not get the VMware HA module but obviously would like to take advantage of a HA manual procedure (so to speak).

So the dumb scenario would be (it's more complex but we'll keep it simple for the sake of the discussion): ESX hostA and ESX hostB + a shared FC LUN with a bunch of vm's.

Some vm's are registered (and running) on hostA, some on hostB.

Say hostA goes down for example. In order to recover one would go on VC pointing hostB <addining to the inventory> (and starting) the vm's that were previously running on hostA. What's not clear to me (and unfrotunately neither myself or the customer can test this in the lab for a few days) is what happens when hostA comes back on-line. Those vm's are supposedly still registered on it (and now on hostB too) so what is the "best practice" to deal with a situation like this and to bring everything back to normal (i.e. all vm's running on hostB and NON registered on hostA). Will a "remove from Inventory" on hostA be sufficient when it gets back on line ?

Thanks. Massimo.

Massimo Re Ferre' VMware vCloud Architect twitter.com/mreferre www.it20.info
0 Kudos
12 Replies
Gabrie1
Commander
Commander

Hi

When HA is enabled, I suppose the host that picks up a VM from the failed host, will also tell VC to unregister the VM for host A and register the VM for host B. When doing this manually, you are only able to run the "vmware-cmd -s register <vmx>" command on host B. And in this way the VMs will come in VC as "discovered VMs". VC will see this as a complete new VM. This also means you could lose any permissions that were previously assigned to it using the folder structure in VC. I bet you will also lose any HA and DRS affinity rules (although in your scenario, you don't have HA and probably also no DRS).

My guess is that the best way to do it is find a way to use the SDK to tell VC that the VM has now moved from Host A to Host B.

Gabrie

http://www.GabesVirtualWorld.com
mreferre
Champion
Champion

Mh ... I see what you mean .... (BTW on top of the register command you have mentioned you can also use the GUI to add the vmx to the inventory .... but this is not the real point).

The point is, assuming I am not interested in maintaining specific folder/permissions/etcetc associated to the vm's, I could run the "vmware-cmd -s unregister <vmx>" command from hostA when it gets back on line and revert it into a "clean" state.

After all you would assign permissions (usually... not always) at the folder level so it is a matter or reassigning the new discovered vm's into the appropriate VC folders and they will inherit all the permissions etc etc.

I do personally agree that using the SDK would be the best way to do this even though I am not sure there is a single API command that says "move that vm with all consistent attributes from hostA to hostB". I guess it's more a matter of programming it so that it reads the current attribbutes, register on the other host, applys the old attributes etc etc etc..... my fear is that this way one would have to write his/her own VMware HA module ... which might be too much. I would personally go more for a quick and dirty (very manual) procedure.

So back to the point above .... does it sound feasable the way I pictured it? Any other watch-outs you can think of (other than folders/permissions)?

Thanks. Massimo.

Massimo Re Ferre' VMware vCloud Architect twitter.com/mreferre www.it20.info
0 Kudos
oreeh
Immortal
Immortal

Are the ESX hosts part of a VC cluster or are they completely standalone (besides the shared FC LUN)?

0 Kudos
mreferre
Champion
Champion

Oliver,

part of a cluster. More for the sake of the design than anything. Also this would allow them to just put in the HA / DRS licenses when / if they want in the future without scrambling up everything.

Does it make a difference for the purpose ?

Massimo.

Massimo Re Ferre' VMware vCloud Architect twitter.com/mreferre www.it20.info
0 Kudos
oreeh
Immortal
Immortal

I'm not sure if it makes a difference.

But my guess is it does - especially when it comes to the locked files (-flat.vmdk, vmware.log, vswp) on the LUN.

Since HA is able to "unlock" / "override" the lock there probably is an (easy?) way to tell ESX to do this (via the APIs?).

Without overriding/removing the locks you are able to register the VM on the other host (the VMX file is not locked) but you won't be able to start the VM.

Message was edited by: oreeh

I'm not sure if vmkfstools -L helps in this case.

0 Kudos
mreferre
Champion
Champion

Not sure if it makes a difference but I see what you are saying.

The only way I know to unlock the LUN in that circumstance is to issue the vmkfstools with the appropriate switch to unlock the VMFS volume (don't remember what that switch is off the top of my head). So either VC/HA issue that command or there is no need for it. Another good comment though for the manual procedure (i.e. "if the vm doesn't start due to file locks issue the command vmkfstools blablabla").

Thanks. Massimo.

P.S. I am pretty sure there is someone out there that is using the

non-Enterprise version of VI3 and is doing similar things..... come on

guys ... don't be shy... Smiley Wink

Massimo Re Ferre' VMware vCloud Architect twitter.com/mreferre www.it20.info
0 Kudos
oreeh
Immortal
Immortal

The manpage states the following:

The `targetreset` and `busreset` command will reset target and bus respectively causing SCSI reservations to be dropped.

This option is potentially disruptive to all servers sharing the storage and is only meant to be used in the context of Clustering.

Not sure what the second ESX host does when you use a lunreset since this would remove the locks of the second ESX as well.

To circumvent this you could try the following:

Setup two LUNs (visible to both hosts) but only actually used from one host.

VMs on hostA run on LUNa and VMs on hostB run on LUNb.

This way when hostA goes down you can reset LUNa without interfering with the running VMs on hostB.

Then register the VMs on LUNa and "fire them up"

Message was edited by: oreeh

Having the VMs registered on both hosts shouldn't be a problem as long as you don't try to start them on the failed host once the host is "up" again - so no autostart of VMs.

0 Kudos
mreferre
Champion
Champion

Mh... I don't really want to get there. This would be the denial of the benefits of virtualization (RDM's being the ultimate denial ... but this would be good for another topic Smiley Happy ).

Quite frankly I don't think VMware HA is rocket science and if there is no limitations in terms of LUNs setup for potential locking issues etc etc there ... I am pretty sure one could manage that manually as well. Certainly the problem is "how" to manage that manually (as I am pretty sure VMware will not be keen to discuss how HA handle this so that we can mimic manually) but yet it could be done.

Perhaps we are just anticipating very niche locking issues that would NOT even appear 99% of the time? That by the way would also explain why HA works more or less 99% of the times (just a joke, I have no data to back that).

Massimo.

Massimo Re Ferre' VMware vCloud Architect twitter.com/mreferre www.it20.info
0 Kudos
oreeh
Immortal
Immortal

Since HA utilizes Legato AAM there should be some information available.

Some of the scripts below /opt/LGTOaam512 are Perl scripts and there even is a demo app in /opt/LGTOaam512/sdk Smiley Wink

mbrkic
Hot Shot
Hot Shot

The SCSI reservations are used only in case of whole LUN locking (e.g., for RAW devices with clustering).

My understaning is that the VMFS3 uses file level locking on the vmdk's, but I am not sure how those can be manipulated manually.

0 Kudos
mcwill
Expert
Expert

Having the VMs registered on both hosts shouldn't be a problem as long as you don't try to start them on the failed host once the host is "up" again - so no autostart of VMs.

This doesn't matter, we have two hosts connected to a shared iscsi SAN. All VMs are in the inventory of each host to make it easy to run each VM on either host.

If you attempt to start a VM on Host2 when it is already running on Host1 it will fail to start with an error indicating it is unable to lock the required files (From memory I think it's the swap file that causes the error). So it's perfectly safe to leave the VMs registered on each host, the only problem that may occur is during backups if your backup regime is based around iterating all VMs on a host, but that can be accomodated by selecting that only running VMs be backed up.

Regards,

Iain

0 Kudos
mreferre
Champion
Champion

> we have two hosts connected to a shared iscsi SAN. All VMs are in the inventory of each host to make it easy to run each VM on either host

Have you done this from within Virtual Center or are these two separate/standalone hosts (i.e. not managed by VC) ?

Massimo.

Massimo Re Ferre' VMware vCloud Architect twitter.com/mreferre www.it20.info
0 Kudos