[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Xen-users] Fatal Trap 18 (convincing hardware engineer)
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi all, I have 2 servers with identical hardware (lspci at the bottom of this email). An Extra Intel PRO/1000 MT Dual Port Server Adapter[1] has been connected into the second slot on a pci-x capable riser (the first slot taken by the SAS Raid controller). When this nic *is* connected *and* the boxes boot a Xen kernel (debian 4.0 2.6.18-5-xen and using Xen HyperVisor(PAE) 3.0.3-0-4) after about 2 days I get this error on the console: (XEN) ----[ Xen-3.0.3-1 x86_32p debug=n Not tainted ]---- (XEN) ----[ Xen-3.0.3-1 x86_32p debug=n Not tainted ]---- (XEN) CPU: 1 (XEN) EIP: e008:[<ff1193be>]CPU: 3 (XEN) EIP: e008:[<ff1193be>] idle_loop+0x4e/0x60 idle_loop+0x4e/0x60 (XEN) EFLAGS: 00000246 CONTEXT: hypervisor (XEN) eax: 00000000 ebx: ffbeffb4 ecx: 00000001 edx: 00000000 (XEN) esi: ffbeffb4 edi: ffbf6080 ebp: 000090dc esp: ffbeffa8 (XEN) cr0: 8005003b cr4: 000006f0 cr3: a3363000 cr2: b7f2c260 (XEN) (XEN) EFLAGS: 00000246 CONTEXT: hypervisor (XEN) ds: e010 es: e010 fs: 0000 gs: 0000 ss: e010 cs: e008 (XEN) eax: 00000000 ebx: ffbe3fb4 ecx: 096a03ba edx: ff18c080 (XEN) Xen stack trace from esp=ffbeffa8: (XEN) esi: ffbf0080 edi: 07a0403a ebp: 000090dc esp: ffbe3fa8 (XEN) 00000001cr0: 8005003b cr4: 000006f0 cr3: a1b80000 cr2: b7edd260 (XEN) 00000001 00001000 00000001 00000000 00000000 00000001 00000001ds: e010 8(XEN) (XEN) 00000000Xen stack trace from esp=ffbe3fa8: (XEN) 00000000 00000001 00f90000 00000003 c01013a7 ffbf0080 00000061 00000001(XEN) 0000007b 0000007b 00000000 00000000 00000001 ffbf6080 00000003 (XEN) Xen call trace: (XEN) [<ff1193be>] (XEN) idle_loop+0x4e/0x60 (XEN) 00000000 (XEN) ************************************ (XEN) 00000000CPU1 FATAL TRAP 18 (machine check), ERROR_CODE 0000. (XEN) System shutting down -- need manual reset. (XEN) ************************************ The machine obviously hangs. If I remove the PCI NIC the machine stays up. If I boot into a vanilla kernel with the NIC in the box it stays up. I have NICs like these bought in batch running in other machines that are also running Xen. The machines aren't really used a great deal (at the moment although need to be soon) and as far as i can tell there's no other issue with respect to the system that is failing, i.e the obvious stuff like disk space running out or exhaustive cronjobs). There are no logs other than the one to the console suggesting a failure elsewhere. Our hardware engineer is convinced it's either a Xen or driver issue. I've seen the thread at http://lists.xensource.com/archives/html/xen-users/2006-08/msg00792.html and have directed the engineer at this. My questions to the list are: 1. Can this be caused by anything else (other than hardware)? 2. Is there anything I can do to debug this further to confirm what part of the system is failing (e.g. either CPU/RAM or PCI/BUS timeout)? Any help on this would be greatly appreciated. Many thanks, Matt - -- Matthew Baker, UNIX Systems Administrator ---------------------------------------------------- Institute for Learning and Research Technology (ILRT) A: University of Bristol, 8-10 Berkeley Square, Bristol. BS8 1HH W: http://www.ilrt.bristol.ac.uk E: matt.baker@xxxxxxxxxx T: +44 (0)117 928 7121 - -- lspci 00:00.0 Host bridge: Intel Corporation E7320 Memory Controller Hub (rev 0c) 00:02.0 PCI bridge: Intel Corporation E7525/E7520/E7320 PCI Express Port A (rev)00:03.0 PCI bridge: Intel Corporation E7525/E7520/E7320 PCI Express Port A1 (re)00:1c.0 PCI bridge: Intel Corporation 6300ESB 64-bit PCI-X Bridge (rev 02) 00:1d.0 USB Controller: Intel Corporation 6300ESB USB Universal Host Controller)00:1d.1 USB Controller: Intel Corporation 6300ESB USB Universal Host Controller)00:1d.4 System peripheral: Intel Corporation 6300ESB Watchdog Timer (rev 02) 00:1d.5 PIC: Intel Corporation 6300ESB I/O Advanced Programmable Interrupt Cont)00:1d.7 USB Controller: Intel Corporation 6300ESB USB2 Enhanced Host Controller)00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 0a) 00:1f.0 ISA bridge: Intel Corporation 6300ESB LPC Interface Controller (rev 02) 00:1f.1 IDE interface: Intel Corporation 6300ESB PATA Storage Controller (rev 0)00:1f.3 SMBus: Intel Corporation 6300ESB SMBus Controller (rev 02) 01:00.0 PCI bridge: Intel Corporation 6700PXH PCI Express-to-PCI Bridge A (rev )01:00.2 PCI bridge: Intel Corporation 6700PXH PCI Express-to-PCI Bridge B (rev )02:02.0 PCI bridge: Intel Corporation 80331 [Lindsay] I/O processor (PCI-X Brid)03:0e.0 RAID bus controller: Adaptec AAC-RAID (rev 0a) 06:01.0 Ethernet controller: Intel Corporation 82541GI/PI Gigabit Ethernet Cont)06:02.0 Ethernet controller: Intel Corporation 82541GI/PI Gigabit Ethernet Cont)07:02.0 VGA compatible controller: ATI Technologies Inc Rage XL (rev 27) -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFHWVH6Lvm7pB/aicMRAioDAJ0Vw2dVALMkYylyR6Pjlw71y8ZZpQCfV+KU Ia7+fPLZQsMXtjmFk5KSNyA= =6fWn -----END PGP SIGNATURE----- _______________________________________________ Xen-users mailing list Xen-users@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-users
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |