VMware Cloud Community
JDMils_Interact
Enthusiast
Enthusiast

When do VMs migrate?

We have a cluster of sensitive Citrix VDA servers and it is very important to keep not more than 4 VMs on each host. Thus the cluster has been set to DRS Manual and DRS rules are set to pin specific VMs to specific hosts.

The VDA VMs are reboot every night at midnight according to the Citrix company which supports the Citrix environment.

I am upgrading the host atm and thus need to move VMs around in order to evacuate one host at-a-time and am performing the host upgrades between 8pm and midnight. From what I know, when DRS is set to Manual, VMs will NOT migrate between hosts, even on power up.

However today I noticed 10 VMs on one host and one VM on other hosts- a real problem come Monday morning when the system starts to load up- the hosts will go 100% CPU & Memory and we are going to get a LOT of complaints from users.

Can someone explain why the VMs are moving around the hosts?

vCenter 7.0u3h

Hosts are a mix of ESXi 6.7 & 7.0u3g

0 Kudos
9 Replies
Giodomi
Hot Shot
Hot Shot

Hello,

seems a kind of bug on the DRS engine when set to manual.
Normally in my experience I handle this kind of Pinning requirement with DRS and Must rules.

Did you maybe try to handle the cluster with DRS Enabled in Automatic and create MUST Rules between group of VMs and Group with Single host? This will avoid for sure that the machine are moving around on the other hosts.
In the case you need to do maintenance of the host you just need to remember to disable the must rule and this will allow the vms to move to the other hosts.

Kudos if it was helpful

GioDomi
0 Kudos
JDMils_Interact
Enthusiast
Enthusiast

At this point in time, with DRS Manual and the pinning rules disabled, it seems that the VMs are popping up on random hosts each night as they are being rebooted. I did not think this was possible!

Today (Sunday) is my last day of upgrades so as soon as I'm finished, I'm going to pin the VMs as I don't want to see 10 Citrix VDAs on one host because if that happens then I'm in the hot seat with the customer.

0 Kudos
redsmurf23
Contributor
Contributor

Getting the same issue with Citrix VDAs rebooting on different hosts with DRS set to Manual and no placement rules.

Did you get anywhere with this JDMils_Interact?

Cheers

0 Kudos
redsmurf23
Contributor
Contributor

Using vCenter 7.0.3p, ESXi 7.0.3o by the way, 

0 Kudos
Tibmeister
Expert
Expert

When powering on a VM it will choose a random host, but not when doing a soft reboot.  I wonder if they are doing a shutdown then power on of the VMs, that would be about the only thing that makes sense.  Check the VM tasks and events to see what things really are, don't just take product owners words because I have had cases where they say reboot but what they mean is full power cycle (power off then power on) which does cause random host assignments.

Keeping DRS in Manual makes things tricky to manage.  What you should do is create "Should" rules for VM to host affinity then set DRS to Fully Automated.  If you set the rules to a "Must" mode, then if the "assigned" host is not available, the VM will not power on.  With the "Should" mode, if at all possible the VM will run on the "assigned" host, but if that's not possible, then the VM will power on with the next available host.  That's really the best way to run that in my experience and not go crazy trying to chase VM affinity like this.

0 Kudos
JDMils_Interact
Enthusiast
Enthusiast

You are correct.The VDA VMs are being shutdown via Guest OS then powered on so it looks like the Citrix management application is shutting them down from vCenter. I guess this is the trigger to cause the VMs to auto-migrate if no DRS rules are defined.

The 3rd party company looking after our customer's Citrix environment have set Cluster DRS to Manual and are manually load-balancing the hosts by placing 4x Citrix VDA VMs on each host. From what I've seen, the Citrix VDA VMs at night, CPU goes down to 5% however they ramp up to around 90% during the day and it seems the Citrix VDAs will start causing issues for users if more than 4 VMs were on the same host before DRS has a chance to start moving them around.

So the 3rd party company wants to avoid the initial issues where too many VDA VMs are on the same host and the issues caused to the users when it eventually starts migrating the VDA VMs to other hosts. In the current setup, the hosts can handle 4x VDA VMs.

Now if the hosts were to fail then yeah, we would have have user issues for those users on the VDA VMs on the failed host, but the number of users affected would be far less.

0 Kudos
Tibmeister
Expert
Expert

DRS Rules mean nothing if the DRS is set to Manual.  If you set the host/VM affinity in the Rules and set DRS to Fully Automatic, then the VMs will be powered up on their assigned host (if available) and powered up on available host if required, if the rules are set as a "Should" rather than "Must".  Users may be a little slow when there's a host failure, but that's better than having nothing at all.  Also, I'm guessing HA is set to manual or these VMs will start on an available host when there's a host failure since DRS is set to Manual.  Also, using the rules and DRS Fully Automated means that if someone goofs and moves things around, the rules will be adhered to at all costs.

It seems there's a general lack of understanding on the management company's part of how DRS works, which I've come crossed many times, especially when dealing with Citrix admins because things don't exist that way in their world.

Another question, why the heck do they have to bounce their VDA's nightly?  That sounds like covering for a larger problem.  Either way, the "solution" to your original issue boils down to setting up the DRS Rules properly then enabling DRS to run Fully Automated so the rules are followed.  That and some education on the management company's part about DRS.

Setting this up is really easy.  You create a host group, call it HostGroup1.  Then add the first host into that group.  You need to have both host and VM groups for this.  Then create a VM Group and call it something like VDAGroup1.  Then create a DRS Rule of type "Virtual Machines to Hosts".  Select your Host Group and VM Group and change the rule type to "Should Run on Hosts in Group" to allow the VMs to run on other hosts ONLY when the assigned host (host in HostGroup1) has failed, otherwise run exclusively on the assigned host.  You can set this to "Must Run on Hosts in Group" but you loose any failover capabilities in the event of a host failure.  Remember, when a VM is powered on, any assigned DRS rules are honored at Poweron, they don't go to a random host (exceptions already noted) then get migrated.

0 Kudos
JDMils_Interact
Enthusiast
Enthusiast

From my testing and from speaking to the senior VMware expert here at work, DRS rules still work when DRS is set to Manual, they are only active when a VM is powered on.

I think for this vCenter, DRS could be set to Automatic as long as there exists a DRS rule for all VMs created. I'll check with the 3rd party Citrix management company to see if they are OK with this.

0 Kudos
Tibmeister
Expert
Expert

Well, here's what the docs have to say:

DRS has three levels of automation:
Fully Automated – DRS applies both initial placement and load balancing recommendations automatically.
Partially Automated – DRS applies recommendations only for initial placement.
Manual – You must apply both initial placement and load balancing recommendations.

This jives not only with your own experience as you've posted but also with my very long experience with vSphere and DRS.  In Manual mode the rules are not used beyond generating the recommendations after power on.

The DRS Rules only apply at the cluster level, not the vCenter level, so as long as your VDA workloads are isolated in their own cluster then you only need rules to govern the VDA placement, not all VMs.  I know in my past experience with Citrix this is very possible in the config to have the VDA appliance in a separate cluster from the VDI desktops.  Citrix has a lot of documentation on how to do this, which in a nutshell you place the VDA's in one cluster and put the Golden Image VM in the cluster that will run the VDI desktops.  I believe there may be a toggle switch in MCS that is needed, but your Citrix and VMware experts should know this already since it's all fundamental knowledge of each product.

Also, under vSphere 7 DRS runs every minute instead of every 5 minutes, so the migration of a misplaced VDA will happen relatively quickly.

0 Kudos