VMware Cloud Community
hmaapje
Contributor
Contributor

High CPU load caused by Hardware Interrupts

Hi all,

It seems to be a common problem, you have a virtual Windows 2003 server running Active Directory on a ESX server and suddenly the performance crawls to almost death.

We'll I'm having this problem now too at my company. After checking some stuff out we've found out that it's beeing caused by the System process owned by SYSTEM. Process Explorer tells us that the Hardware Interrupts are using almost 60% of our CPU usage causing the domain controller to slow down. Which then causes our whole Citrix farm to slow down (cause it's using shares etc on the domain controller).

What have we done?

- We've updated the VMware tools to the latest version supported by our ESX servers

- Killed the virusscanner on that server

- Checked if we're using the LSI Logic SCSI Controller, which we do

- Have searches the 'whole' internet and the VMware forum

All with no result Smiley Sad

After a reboot yesterday all seemed normal again but this morning about 07.30 it all started again (see attached screenshot).

Our ESX servers are running WMware ESX Server, 3.0.2, 61618.

Does anyone have an idea about what might be going on here?

Cheers!

0 Kudos
16 Replies
depping
Leadership
Leadership

Welcome to the forums.

Was it a newly build machine or a P2V? I've personally never witnessed this...

Duncan

VMware Communities User Moderator

-


Blogging:

Twitter:

If you find this information useful, please award points for "correct" or "helpful".

0 Kudos
hmaapje
Contributor
Contributor

This is a newly installed server directly using VMware. So the HAL and stuff should be set okay.

There are enumerous threads about this but none have a solution.

Could this problem beeing caused by an outdated ESX server (as in version of ESX)? Because the server has been running since december 2007 without problems (with ofcourse a reboot now an then).

0 Kudos
hmaapje
Contributor
Contributor

We are going to move the VM to another ESX host to test if it's caused by the ESX machine or by the VM.

We are also going to update our ESX servers, I'll post the result here.

If there are any recommendations please post them Smiley Happy

0 Kudos
hmaapje
Contributor
Contributor

Migrating the server to another ESX host has resolved this issue. These ESX servers are getting an update from 3.0.2 to 3.5U4 to resolve this and other ESX-issues. :smileygrin:

0 Kudos
curriertech
Enthusiast
Enthusiast

I have this issue despite having the latest version and patches on my ESX servers. It's happening on two of our file servers.

-Josh.

-Josh.
0 Kudos
Sanjana
Hot Shot
Hot Shot

curriertech,

I'm curious as to what -

  • -the workload is

  • -is the storage attached to the ESX server thats being used by the VMs

  • -is the storage controller on the box.

--sanjana

0 Kudos
curriertech
Enthusiast
Enthusiast

Hi, it's on a file server with 100 or so users. It's running DFS replication to two other sites on some subfolders. THe VM in question is on an EqualLogic iSCSI SAN connected to the hosts via QLogic4062c HBAs, and there are 4 hosts doing HA/DRS with a total of 25 guests. I have other servers (mail, for example) that are doing much higher i/o without the system process using so much CPU.

-Josh.

-Josh.
0 Kudos
Sanjana
Hot Shot
Hot Shot

> I have other servers (mail, for example) that are doing much higher i/o without the system process using so much CPU.

ah. And what are the storage controllers on the ESX hosts that these VMs are running on? Are they also QL 4062s?This is just a hunch, but are the BIOS and f/w on the card at the latest recommended version? You might also want to verify your HBA settings with the document attached.

--sanjana

0 Kudos
Sanjana
Hot Shot
Hot Shot

As an afterthough, do you see the high interrupt load only on that VM? Or on all VMs on that box? If it's all VMs then the HBA settings makes sense.

0 Kudos
curriertech
Enthusiast
Enthusiast

This is the only VM that has this problem, and the problem existed before we got the HBAs so I don't think it's the iscsi settings either.

-Josh.

-Josh.
0 Kudos
Sanjana
Hot Shot
Hot Shot

Josh,

What are the specs of the VM?

- OS ? / {32|64} bit ?

- # vCPUS?

- memory

--sanjana

0 Kudos
curriertech
Enthusiast
Enthusiast

Server 2003 R2 32bit, 2 vcpu, 2G ram, one 30G drive and two 800G drives. Memory usage (via permon or taskmgr) is well less than 50% but cpu can run at 50% with only the system process causing it. Boosting the memory to 4G had no impact.

-Josh.

-Josh.
0 Kudos
Sanjana
Hot Shot
Hot Shot

(Sorry Josh, I'm just gonna bug you with more questions Smiley Happy )

- What is the virtual scsi adapter in the VM? (buslogic or lsilogic). (It might help to check the max queue-depth setting for these adapters)

- Does reducing the number of vcpus have any impact?

- This isn't a P2V'ed VM is it?

--sanjana

0 Kudos
curriertech
Enthusiast
Enthusiast

Hey the more questions I get about this, the more likely you, I, or someone else is going to notice a problem. : )

scsi adapters are lsilogic. I've tried 1,2, and 4 cpu and it still acts up. This server is not a p2v'd box.

-Josh.

-Josh.
0 Kudos
Sanjana
Hot Shot
Hot Shot

Well, there are three things I can suggest you take a look at-

1. Install tools (This is actually a question in guise. 😛 " Do you have tools installed?" )

2. You could check the LSILogic 1038 maximum queue depth setting.

3. Does perfmon in windows tell you what the source of the interrupts are? (Disk Interrupts vs. network interrupts perhaps?)

--sanjana

0 Kudos
curriertech
Enthusiast
Enthusiast

1. yes

2. where do I check that, in the .vmx?

3. I'll check that... EDIT, I can't find a way to tell that, in process explorer. I haven't used it too much in the past so it's probably just me.

Smiley Happy

-Josh.

-Josh.
0 Kudos