DRS is allowing hosts to get to very high processor usages before moving VMs. The issue is, a customers IT director has read only access to vcenter. He is seeing that some ESX hosts are getting high CPU use of up to 98% consistently, with a warning on the host. He is questioning this.
I am not aware of any adverse effects on the production environment, but it does look noticeable when viewing the cluster on the hosts tab. Sometimes, four out of sixteen hosts have a warning during production hours. The attached picture shows a typical morning, with some hosts at 98% and some at 50%.
The cluster states a failover capacity of 11 hosts out of the 16, yet we have very busy periods.
Is there anything that can be done to configure things differently in DRS? Could we use affinity rules to keep busy VMs on separate hosts?
Some of the DRS Best practice:
1) When deciding which hosts to group into a DRS cluster, try to choose hosts that are as homogeneous as possible in CPU and memory.This ensures higher performance predictability and stability.VMotion is not supported across hosts with incompatible CPUs. Hence with heterogeneous systems that have incompatible CPUs, DRS is limited in the number of opportunities for improving the load balance across the cluster.
2) When more ESX hosts in a DRS cluster are VMotion compatible, DRS has more choices to better balance workloads across the cluster
3) Do not specify affinity rules unless you have a specific need to do so. In some cases, however, specifying affinity rules can improve performance
4) Assign resource allocations to virtual machines and resource pools carefully. Be mindful of the impact of limits, reservations and virtual machine memory overhead
5) Virtual machines with smaller memory sizes or fewer virtual CPUs provide more opportunities for DRS to migrate them in order to improve balance across the cluster. Virtual machines with larger memory size or more virtual CPUs add more constraints in migrating the virtual machines. Hence you should configure only as many virtual CPUs and as much memory for a virtual machine as needed.
That rather looks like a non load balanced cluster. I doubt that the desired failover level can be reached seeing the usage. What about trying to set DRS to the 5 star mode and let it sort the VM's until the load is balanced, then think about affinity rules, or even splitting up the cluster so highly utilized VM's are less affecting the "normal" ones?
xbradshr wrote:
DRS is allowing hosts to get to very high processor usages before moving VMs. The issue is, a customers IT director has read only access to vcenter. He is seeing that some ESX hosts are getting high CPU use of up to 98% consistently, with a warning on the host. He is questioning this.
I am not aware of any adverse effects on the production environment, but it does look noticeable when viewing the cluster on the hosts tab. Sometimes, four out of sixteen hosts have a warning during production hours. The attached picture shows a typical morning, with some hosts at 98% and some at 50%.
The cluster states a failover capacity of 11 hosts out of the 16, yet we have very busy periods.
Is there anything that can be done to configure things differently in DRS? Could we use affinity rules to keep busy VMs on separate hosts?
hi is there any time specific or randomly
please check in that time any application owner posted lot of job and alos virus
Yours,
Satya
xbradshr wrote:
The issue is, a customers IT director has read only access to vcenter.
Ouch - that doesn't sound like a nice place.
You should only really be making configuration changes if there is actual measurable problem with VM performance. If there is none, then there is nothing to improve. However, keeping a customer's IT director happy is important, and if you can't educate him as to why its not a problem, then you have to be seen to be doing things...
As suggested, you could turn DRS to 5-star/Aggressive mode, and check that its set to Fully Automated. However, causing VM's to vMotion around frequently will impact their performance as this happens. Generally the effect is negiligble, but the more you vMotion the greater the chance of someone noticing. ESX's are expected to run hot, and can deal with higher load VM's better if they stay one place rather than move around continuously.
Some of the DRS Best practice:
1) When deciding which hosts to group into a DRS cluster, try to choose hosts that are as homogeneous as possible in CPU and memory.This ensures higher performance predictability and stability.VMotion is not supported across hosts with incompatible CPUs. Hence with heterogeneous systems that have incompatible CPUs, DRS is limited in the number of opportunities for improving the load balance across the cluster.
2) When more ESX hosts in a DRS cluster are VMotion compatible, DRS has more choices to better balance workloads across the cluster
3) Do not specify affinity rules unless you have a specific need to do so. In some cases, however, specifying affinity rules can improve performance
4) Assign resource allocations to virtual machines and resource pools carefully. Be mindful of the impact of limits, reservations and virtual machine memory overhead
5) Virtual machines with smaller memory sizes or fewer virtual CPUs provide more opportunities for DRS to migrate them in order to improve balance across the cluster. Virtual machines with larger memory size or more virtual CPUs add more constraints in migrating the virtual machines. Hence you should configure only as many virtual CPUs and as much memory for a virtual machine as needed.