Hi,
We have a virtual machine running RHEL 5.1 (64-bit) that crashed this morning. The virtual machine serves as a MySQL server, it has 4vCPU and 4096MB assigned; host is ESX 3.5.0, 113339.
These are the messages in the log:
Jul 15 15:50:19 hobbes kernel: ip_conntrack version 2.4 (8192 buckets, 65536 max) - 304 bytes per conntrack
Jul 16 01:37:55 hobbes kernel: irq 177: nobody cared (try booting with the "irqpoll" option)
Jul 16 01:37:55 hobbes kernel:
Jul 16 01:37:55 hobbes kernel: Call Trace:
Jul 16 01:37:55 hobbes kernel: ] (e1000_intr+0x0/0x113 )
Jul 16 01:37:55 hobbes kernel: Disabling IRQ #177
Jul 16 01:45:15 hobbes kernel: NETDEV WATCHDOG: eth0: transmit timed out
Jul 16 03:41:58 hobbes kernel: klogd 1.4.1, log source = /proc/kmsg started.
Should I reboot the machine with irqpoll?. Interrupt 177 is eth0.
Has anyone seen those errors on a vm running Linux? Are there any recommendations for the configuration of my virtual machine?
I appreciate your comments.
Thank you!
We just had the same issue with RHEL 5.4. I'm working with Red Hat and (HP VAR) from VMWare on determing a root cause.
Did you have any luck in finding the root cause of this error? I'm having the same problem on my RHEL 5.4 VM.
irq 177: nobody cared (try booting with the "irqpoll" option)
Call Trace:
<IRQ> __report_bad_irq+0x30/0x7d
:e1000:e1000_watchdog_task+0x0/0x65c
apic_timer_interrupt+0x66/0x6c
<EOI> :e1000:e1000_watchdog_task+0x5fd/0x65c
handlers:
Disabling IRQ #177
Hi everyone,
noticed the same issue on SuSE Linux Enterprise 10 SP2 (non-patched) Kernel 2.6.16.60-0.21-smp x86_64
Nov 23 08:11:17 mymachine kernel: irq 185: nobody cared (try booting with the "irqpoll" option)
Nov 23 08:11:17 mymachine kernel:
Nov 23 08:11:17 mymachine kernel: Call Trace: <IRQ> <ffffffff8015e118>{__report_bad_irq+48}
Nov 23 08:11:17 mymachine kernel: <ffffffff8015e321>{note_interrupt444} <ffffffff8015dbf4>{__do_IRQ191}
Nov 23 08:11:17 mymachine kernel: <ffffffff881f18e1>{:e1000:e1000_watchdog_task0} <ffffffff881f18e1>{:e1000:e1000_watchdog_task0}
Nov 23 08:11:17 mymachine kernel: <ffffffff8010d569>{do_IRQ59} <ffffffff8010b35e>{ret_from_intr0}
Nov 23 08:11:17 mymachine kernel: <ffffffff881f18e1>{:e1000:e1000_watchdog_task0} <ffffffff80139769>{__do_softirq74}
Nov 23 08:11:17 mymachine kernel: <ffffffff8010c222>{call_softirq30} <ffffffff8010d1a4>{do_softirq44}
Nov 23 08:11:17 mymachine kernel: <ffffffff8010bb7c>{apic_timer_interrupt132} <EOI> <ffffffff881f18e1>{:e1000:e1000_watchdog_task0}
Nov 23 08:11:17 mymachine kernel: <ffffffff881f20a8>{:e1000:e1000_watchdog_task+1991}
Nov 23 08:11:17 mymachine kernel: <ffffffff881f2014>{:e1000:e1000_watchdog_task+1843}
Nov 23 08:11:17 mymachine kernel: <ffffffff8012d6bc>{__wake_up56} <ffffffff881f18e1>{:e1000:e1000_watchdog_task0}
Nov 23 08:11:17 mymachine kernel: <ffffffff80144386>{run_workqueue139} <ffffffff80144a94>{worker_thread0}
Nov 23 08:11:17 mymachine kernel: <ffffffff80144b88>{worker_thread244} <ffffffff8012c668>{default_wake_function0}
Nov 23 08:11:17 mymachine kernel: <ffffffff80147dfc>{kthread236} <ffffffff8010bed2>{child_rip8}
Nov 23 08:11:17 mymachine kernel: <ffffffff80147d10>{kthread0} <ffffffff8010beca>{child_rip0}
Nov 23 08:11:17 mymachine kernel: handlers:
Nov 23 08:11:17 mymachine kernel: (e1000_intr+0x0/0x24c )
Nov 23 08:11:17 mymachine kernel: Disabling IRQ #185
However, it seems the cause of the problem is inside the e1000 driver, or VMware's e1000 emulation:
mymachine:~ # cat /proc/interrupts*
CPU0 CPU1
0: 1217676150 0 IO-APIC-edge timer
1: 398 29 IO-APIC-edge i8042
8: 0 0 IO-APIC-edge rtc
9: 0 0 IO-APIC-level acpi
12: 169 0 IO-APIC-edge i8042
14: 38822670 7838 IO-APIC-edge ide0
169: 3241937 1804 IO-APIC-level ioc0
177: 8867372 2624 IO-APIC-level eth0
185: 3002413 84 IO-APIC-level eth1
NMI: 0 0
LOC: 1217676878 1217676733
ERR: 0
MIS: 0
Regards,
Thomas Uhde
I have just encountered that same issue today with a virtual machine running:
- CentOS 5.4 (64-bit)
- linux kernel 2.6.18-164.el5 (SMP)
- on ESX Server 4 (virtual machine version 4).
- vmware-tools installed
- 1 cpu + 1GB RAM
I was able to login through the console and determine that only network connectivity was affected.
Ran: % /sbin/service network restart
... and network connectivity was re-established. Would really like to know why it got into the state that it did.
Got the same problem on a SLES 10 VMWare guest...
64-bit? Or 32-bit of SLES 10?
Should've said, 64 bit so using the E1000 driver.
Jon
Ich bin zurzeit nicht im Hause. Ich kehre am 01.09.10 zurück. Bitte wenden Sie sich in dringenden Fällen an Thomas Dierkes (Thomas.Dierkes@havilog.com) oder Daniel Lübbe (Daniel.Luebbe@havilog.com).
I'm currently out of office. I will be back on 01.09.10. In urgent cases please contact Thomas Dierkes (Thomas.Dierkes@havilog.com) or Daniel Lübbe (Daniel.Luebbe@havilog.com).
got the same error on SLES10 SP3 (64bit).
Opened a call at Novell support and got recommendation to add the boot option aerdriver.off=1