Re: irq 177, boot Linux virtual machine with irqpo...

namilak · ‎07-16-2009

Hi,

We have a virtual machine running RHEL 5.1 (64-bit) that crashed this morning. The virtual machine serves as a MySQL server, it has 4vCPU and 4096MB assigned; host is ESX 3.5.0, 113339.

These are the messages in the log:

Jul 15 15:50:19 hobbes kernel: ip_conntrack version 2.4 (8192 buckets, 65536 max) - 304 bytes per conntrack

Jul 16 01:37:55 hobbes kernel: irq 177: nobody cared (try booting with the "irqpoll" option)

Jul 16 01:37:55 hobbes kernel:

Jul 16 01:37:55 hobbes kernel: Call Trace:

Jul 16 01:37:55 hobbes kernel: ] (e1000_intr+0x0/0x113 )

Jul 16 01:37:55 hobbes kernel: Disabling IRQ #177

Jul 16 01:45:15 hobbes kernel: NETDEV WATCHDOG: eth0: transmit timed out

Jul 16 03:41:58 hobbes kernel: klogd 1.4.1, log source = /proc/kmsg started.

Should I reboot the machine with irqpoll?. Interrupt 177 is eth0.

Has anyone seen those errors on a vm running Linux? Are there any recommendations for the configuration of my virtual machine?

I appreciate your comments.

Thank you!

Tearstone · ‎10-16-2009

We just had the same issue with RHEL 5.4. I'm working with Red Hat and (HP VAR) from VMWare on determing a root cause.

calle77 · ‎11-20-2009

Did you have any luck in finding the root cause of this error? I'm having the same problem on my RHEL 5.4 VM.

irq 177: nobody cared (try booting with the "irqpoll" option)

Call Trace:

<IRQ> __report_bad_irq+0x30/0x7d

note_interrupt+0x1e6/0x227

__do_IRQ+0xbd/0x103

:e1000:e1000_watchdog_task+0x0/0x65c

do_IRQ+0xe7/0xf5

ret_from_intr+0x0/0xa

__do_softirq+0x51/0x133

call_softirq+0x1c/0x28

do_softirq+0x2c/0x85

apic_timer_interrupt+0x66/0x6c

<EOI> :e1000:e1000_watchdog_task+0x5fd/0x65c

run_workqueue+0x94/0xe4

worker_thread+0x0/0x122

worker_thread+0xf0/0x122

default_wake_function+0x0/0xe

kthread+0xfe/0x132

child_rip+0xa/0x11

kthread+0x0/0x132

child_rip+0x0/0x11

handlers:

(e1000_intr+0x0/0x113 )

Disabling IRQ #177

acidix · ‎11-23-2009

Hi everyone,

noticed the same issue on SuSE Linux Enterprise 10 SP2 (non-patched) Kernel 2.6.16.60-0.21-smp x86_64

Nov 23 08:11:17 mymachine kernel: irq 185: nobody cared (try booting with the "irqpoll" option)

Nov 23 08:11:17 mymachine kernel:

Nov 23 08:11:17 mymachine kernel: Call Trace: <IRQ> <ffffffff8015e118>{__report_bad_irq+48}

Nov 23 08:11:17 mymachine kernel: <ffffffff8015e321>{note_interrupt444} <ffffffff8015dbf4>{__do_IRQ191}

Nov 23 08:11:17 mymachine kernel: <ffffffff881f18e1>{:e1000:e1000_watchdog_task0} <ffffffff881f18e1>{:e1000:e1000_watchdog_task0}

Nov 23 08:11:17 mymachine kernel: <ffffffff8010d569>{do_IRQ59} <ffffffff8010b35e>{ret_from_intr0}

Nov 23 08:11:17 mymachine kernel: <ffffffff881f18e1>{:e1000:e1000_watchdog_task0} <ffffffff80139769>{__do_softirq74}

Nov 23 08:11:17 mymachine kernel: <ffffffff8010c222>{call_softirq30} <ffffffff8010d1a4>{do_softirq44}

Nov 23 08:11:17 mymachine kernel: <ffffffff8010bb7c>{apic_timer_interrupt132} <EOI> <ffffffff881f18e1>{:e1000:e1000_watchdog_task0}

Nov 23 08:11:17 mymachine kernel: <ffffffff881f20a8>{:e1000:e1000_watchdog_task+1991}

Nov 23 08:11:17 mymachine kernel: <ffffffff881f2014>{:e1000:e1000_watchdog_task+1843}

Nov 23 08:11:17 mymachine kernel: <ffffffff8012d6bc>{__wake_up56} <ffffffff881f18e1>{:e1000:e1000_watchdog_task0}

Nov 23 08:11:17 mymachine kernel: <ffffffff80144386>{run_workqueue139} <ffffffff80144a94>{worker_thread0}

Nov 23 08:11:17 mymachine kernel: <ffffffff80144b88>{worker_thread244} <ffffffff8012c668>{default_wake_function0}

Nov 23 08:11:17 mymachine kernel: <ffffffff80147dfc>{kthread236} <ffffffff8010bed2>{child_rip8}

Nov 23 08:11:17 mymachine kernel: <ffffffff80147d10>{kthread0} <ffffffff8010beca>{child_rip0}

Nov 23 08:11:17 mymachine kernel: handlers:

Nov 23 08:11:17 mymachine kernel: (e1000_intr+0x0/0x24c )

Nov 23 08:11:17 mymachine kernel: Disabling IRQ #185

However, it seems the cause of the problem is inside the e1000 driver, or VMware's e1000 emulation:

mymachine:~ # cat /proc/interrupts*

CPU0 CPU1

0: 1217676150 0 IO-APIC-edge timer

1: 398 29 IO-APIC-edge i8042

8: 0 0 IO-APIC-edge rtc

9: 0 0 IO-APIC-level acpi

12: 169 0 IO-APIC-edge i8042

14: 38822670 7838 IO-APIC-edge ide0

169: 3241937 1804 IO-APIC-level ioc0

177: 8867372 2624 IO-APIC-level eth0

185: 3002413 84 IO-APIC-level eth1

NMI: 0 0

LOC: 1217676878 1217676733

ERR: 0

MIS: 0

Regards,

Thomas Uhde

ccklam · ‎07-22-2010

I have just encountered that same issue today with a virtual machine running:

- CentOS 5.4 (64-bit)

- linux kernel 2.6.18-164.el5 (SMP)

- on ESX Server 4 (virtual machine version 4).

- vmware-tools installed

- 1 cpu + 1GB RAM

I was able to login through the console and determine that only network connectivity was affected.

Ran: % /sbin/service network restart

... and network connectivity was re-established. Would really like to know why it got into the state that it did.

JonRoderick · ‎08-12-2010

Got the same problem on a SLES 10 VMWare guest...

ccklam · ‎08-12-2010

64-bit? Or 32-bit of SLES 10?

JonRoderick · ‎08-12-2010

Should've said, 64 bit so using the E1000 driver.

Jon

tbraechter · ‎08-12-2010

Ich bin zurzeit nicht im Hause. Ich kehre am 01.09.10 zurück. Bitte wenden Sie sich in dringenden Fällen an Thomas Dierkes (Thomas.Dierkes@havilog.com) oder Daniel Lübbe (Daniel.Luebbe@havilog.com).

I'm currently out of office. I will be back on 01.09.10. In urgent cases please contact Thomas Dierkes (Thomas.Dierkes@havilog.com) or Daniel Lübbe (Daniel.Luebbe@havilog.com).

swalddoerfer · ‎08-18-2010

got the same error on SLES10 SP3 (64bit).

Opened a call at Novell support and got recommendation to add the boot option aerdriver.off=1

All

irq 177, boot Linux virtual machine with irqpoll?