Solved: ESXI 4 DRS best practise

xbradshr · ‎11-07-2011

DRS is allowing hosts to get to very high processor usages before moving VMs. The issue is, a customers IT director has read only access to vcenter. He is seeing that some ESX hosts are getting high CPU use of up to 98% consistently, with a warning on the host. He is questioning this.

I am not aware of any adverse effects on the production environment, but it does look noticeable when viewing the cluster on the hosts tab. Sometimes, four out of sixteen hosts have a warning during production hours. The attached picture shows a typical morning, with some hosts at 98% and some at 50%.

The cluster states a failover capacity of 11 hosts out of the 16, yet we have very busy periods.

Is there anything that can be done to configure things differently in DRS? Could we use affinity rules to keep busy VMs on separate hosts?

shishir08 · ‎11-10-2011

Some of the DRS Best practice:

1) When deciding which hosts to group into a DRS cluster, try to choose hosts that are as homogeneous as possible in CPU and memory.This ensures higher performance predictability and stability.VMotion is not supported across hosts with incompatible CPUs. Hence with heterogeneous systems that have incompatible CPUs, DRS is limited in the number of opportunities for improving the load balance across the cluster.

2) When more ESX hosts in a DRS cluster are VMotion compatible, DRS has more choices to better balance workloads across the cluster

3) Do not specify affinity rules unless you have a specific need to do so. In some cases, however, specifying affinity rules can improve performance

4) Assign resource allocations to virtual machines and resource pools carefully. Be mindful of the impact of limits, reservations and virtual machine memory overhead

5) Virtual machines with smaller memory sizes or fewer virtual CPUs provide more opportunities for DRS to migrate them in order to improve balance across the cluster. Virtual machines with larger memory size or more virtual CPUs add more constraints in migrating the virtual machines. Hence you should configure only as many virtual CPUs and as much memory for a virtual machine as needed.

View solution in original post

farkasharry · ‎11-07-2011

That rather looks like a non load balanced cluster. I doubt that the desired failover level can be reached seeing the usage. What about trying to set DRS to the 5 star mode and let it sort the VM's until the load is balanced, then think about affinity rules, or even splitting up the cluster so highly utilized VM's are less affecting the "normal" ones?

*** If you find this or any other answer useful please consider awarding points by marking the answer correct or helpful! *** vExpert 2019, VCAP-DCA,VCP,MCSE,MCITS and some more...

satya1 · ‎11-08-2011

xbradshr wrote:

DRS is allowing hosts to get to very high processor usages before moving VMs. The issue is, a customers IT director has read only access to vcenter. He is seeing that some ESX hosts are getting high CPU use of up to 98% consistently, with a warning on the host. He is questioning this.

I am not aware of any adverse effects on the production environment, but it does look noticeable when viewing the cluster on the hosts tab. Sometimes, four out of sixteen hosts have a warning during production hours. The attached picture shows a typical morning, with some hosts at 98% and some at 50%.

The cluster states a failover capacity of 11 hosts out of the 16, yet we have very busy periods.

Is there anything that can be done to configure things differently in DRS? Could we use affinity rules to keep busy VMs on separate hosts?

hi is there any time specific or randomly

please check in that time any application owner posted lot of job and alos virus

Yours,

Satya

SimonStrutt · ‎11-08-2011

xbradshr wrote:
The issue is, a customers IT director has read only access to vcenter.

Ouch - that doesn't sound like a nice place.

You should only really be making configuration changes if there is actual measurable problem with VM performance. If there is none, then there is nothing to improve. However, keeping a customer's IT director happy is important, and if you can't educate him as to why its not a problem, then you have to be seen to be doing things...

As suggested, you could turn DRS to 5-star/Aggressive mode, and check that its set to Fully Automated. However, causing VM's to vMotion around frequently will impact their performance as this happens. Generally the effect is negiligble, but the more you vMotion the greater the chance of someone noticing. ESX's are expected to run hot, and can deal with higher load VM's better if they stay one place rather than move around continuously.

"The greatest challenge to any thinker is stating the problem in a way that will allow a solution." - Bertrand Russell

shishir08 · ‎11-10-2011