[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] IBM HS20 Xen 4.1 and 4.2 Critical Interrupt - Front panel NMI crash



Hello,
 
In Bladecenter webfrontend appears:
 
27 I Blade_09 09/08/13 13:25:17 0x806f0013 Chassis, (NMI State) diagnostic interrupt
28 E Blade_09 09/08/13 13:25:12 0x10000002 SMI Hdlr: 00151743 HI Fatal Error, HI_FERR/NERR Value= 0020
29 I Blade_09 09/08/13 13:09:14 0x806f0013 Recovery Chassis, (NMI State) diagnostic interrupt
30 I Blade_09 09/08/13 13:09:03 0x806f0013 Chassis, (NMI State) diagnostic interrupt
31 E Blade_09 09/08/13 13:08:58 0x10000002 SMI Hdlr: 00151743 HI Fatal Error, HI_FERR/NERR Value= 0020
32 I Blade_09 09/08/13 12:46:26 0x806f0013 Recovery Chassis, (NMI State) diagnostic interrupt
33 I Blade_09 09/08/13 12:46:15 0x806f0013 Chassis, (NMI State) diagnostic interrupt
34 E Blade_09 09/08/13 12:46:11 0x10000002 SMI Hdlr: 00151743 HI Fatal Error, HI_FERR/NERR Value= 0020
35 I Blade_09 09/08/13 12:34:13 0x806f0013 Recovery Chassis, (NMI State) diagnostic interrupt
36 I Blade_09 09/08/13 12:34:03 0x806f0013 Chassis, (NMI State) diagnostic interrupt
37 E Blade_09 09/08/13 12:33:58 0x10000002 SMI Hdlr: 00151743 HI Fatal Error, HI_FERR/NERR Value= 0020
38 I Blade_09 09/08/13 12:27:25 0x806f0013 Recovery Chassis, (NMI State) diagnostic interrupt
39 I Blade_09 09/08/13 12:27:14 0x806f0013 Chassis, (NMI State) diagnostic interrupt
40 E Blade_09 09/08/13 12:27:10 0x10000002 SMI Hdlr: 00151743 HI Fatal Error, HI_FERR/NERR Value= 0020
41 I Blade_09 09/08/13 12:20:45 0x806f0013 Recovery Chassis, (NMI State) diagnostic interrupt
42 I Blade_09 09/08/13 12:20:34 0x806f0013 Chassis, (NMI State) diagnostic interrupt
43 E Blade_09 09/08/13 12:20:30 0x10000002 SMI Hdlr: 00151743 HI Fatal Error, HI_FERR/NERR Value= 0020
44 I Blade_09 09/08/13 12:18:20 0x806f0013 Recovery Chassis, (NMI State) diagnostic interrupt
45 I Blade_09 09/08/13 12:18:10 0x806f0013 Chassis, (NMI State) diagnostic interrupt
46 E Blade_09 09/08/13 12:18:05 0x10000002 SMI Hdlr: 00151743 HI Fatal Error, HI_FERR/NERR Value= 0020
47 I Blade_09 09/08/13 12:15:47 0x806f0013 Recovery Chassis, (NMI State) diagnostic interrupt
48 I Blade_09 09/08/13 12:15:37 0x806f0013 Chassis, (NMI State) diagnostic interrupt
49 E Blade_09 09/08/13 12:15:32 0x10000002 SMI Hdlr: 00151743 HI Fatal Error, HI_FERR/NERR Value= 0020
27 I Blade_09 09/08/13 13:25:17 0x806f0013 Chassis, (NMI State) diagnostic interrupt
28 E Blade_09 09/08/13 13:25:12 0x10000002 SMI Hdlr: 00151743 HI Fatal Error, HI_FERR/NERR Value= 0020
29 I Blade_09 09/08/13 13:09:14 0x806f0013 Recovery Chassis, (NMI State) diagnostic interrupt
30 I Blade_09 09/08/13 13:09:03 0x806f0013 Chassis, (NMI State) diagnostic interrupt
31 E Blade_09 09/08/13 13:08:58 0x10000002 SMI Hdlr: 00151743 HI Fatal Error, HI_FERR/NERR Value= 0020
32 I Blade_09 09/08/13 12:46:26 0x806f0013 Recovery Chassis, (NMI State) diagnostic interrupt
33 I Blade_09 09/08/13 12:46:15 0x806f0013 Chassis, (NMI State) diagnostic interrupt
34 E Blade_09 09/08/13 12:46:11 0x10000002 SMI Hdlr: 00151743 HI Fatal Error, HI_FERR/NERR Value= 0020
35 I Blade_09 09/08/13 12:34:13 0x806f0013 Recovery Chassis, (NMI State) diagnostic interrupt
36 I Blade_09 09/08/13 12:34:03 0x806f0013 Chassis, (NMI State) diagnostic interrupt
37 E Blade_09 09/08/13 12:33:58 0x10000002 SMI Hdlr: 00151743 HI Fatal Error, HI_FERR/NERR Value= 0020
38 I Blade_09 09/08/13 12:27:25 0x806f0013 Recovery Chassis, (NMI State) diagnostic interrupt
39 I Blade_09 09/08/13 12:27:14 0x806f0013 Chassis, (NMI State) diagnostic interrupt
40 E Blade_09 09/08/13 12:27:10 0x10000002 SMI Hdlr: 00151743 HI Fatal Error, HI_FERR/NERR Value= 0020
41 I Blade_09 09/08/13 12:20:45 0x806f0013 Recovery Chassis, (NMI State) diagnostic interrupt
42 I Blade_09 09/08/13 12:20:34 0x806f0013 Chassis, (NMI State) diagnostic interrupt
43 E Blade_09 09/08/13 12:20:30 0x10000002 SMI Hdlr: 00151743 HI Fatal Error, HI_FERR/NERR Value= 0020
44 I Blade_09 09/08/13 12:18:20 0x806f0013 Recovery Chassis, (NMI State) diagnostic interrupt
45 I Blade_09 09/08/13 12:18:10 0x806f0013 Chassis, (NMI State) diagnostic interrupt
46 E Blade_09 09/08/13 12:18:05 0x10000002 SMI Hdlr: 00151743 HI Fatal Error, HI_FERR/NERR Value= 0020
47 I Blade_09 09/08/13 12:15:47 0x806f0013 Recovery Chassis, (NMI State) diagnostic interrupt
48 I Blade_09 09/08/13 12:15:37 0x806f0013 Chassis, (NMI State) diagnostic interrupt
49 E Blade_09 09/08/13 12:15:32 0x10000002 SMI Hdlr: 00151743 HI Fatal Error, HI_FERR/NERR Value= 0020
Thanks

 
27 I Blade_09 09/08/13 13:25:17 0x806f0013 Chassis, (NMI State) diagnostic interrupt
28 E Blade_09 09/08/13 13:25:12 0x10000002 SMI Hdlr: 00151743 HI Fatal Error, HI_FERR/NERR Value= 0020
29 I Blade_09 09/08/13 13:09:14 0x806f0013 Recovery Chassis, (NMI State) diagnostic interrupt
30 I Blade_09 09/08/13 13:09:03 0x806f0013 Chassis, (NMI State) diagnostic interrupt
31 E Blade_09 09/08/13 13:08:58 0x10000002 SMI Hdlr: 00151743 HI Fatal Error, HI_FERR/NERR Value= 0020
32 I Blade_09 09/08/13 12:46:26 0x806f0013 Recovery Chassis, (NMI State) diagnostic interrupt
33 I Blade_09 09/08/13 12:46:15 0x806f0013 Chassis, (NMI State) diagnostic interrupt
34 E Blade_09 09/08/13 12:46:11 0x10000002 SMI Hdlr: 00151743 HI Fatal Error, HI_FERR/NERR Value= 0020
35 I Blade_09 09/08/13 12:34:13 0x806f0013 Recovery Chassis, (NMI State) diagnostic interrupt
36 I Blade_09 09/08/13 12:34:03 0x806f0013 Chassis, (NMI State) diagnostic interrupt
37 E Blade_09 09/08/13 12:33:58 0x10000002 SMI Hdlr: 00151743 HI Fatal Error, HI_FERR/NERR Value= 0020
38 I Blade_09 09/08/13 12:27:25 0x806f0013 Recovery Chassis, (NMI State) diagnostic interrupt
39 I Blade_09 09/08/13 12:27:14 0x806f0013 Chassis, (NMI State) diagnostic interrupt
40 E Blade_09 09/08/13 12:27:10 0x10000002 SMI Hdlr: 00151743 HI Fatal Error, HI_FERR/NERR Value= 0020
41 I Blade_09 09/08/13 12:20:45 0x806f0013 Recovery Chassis, (NMI State) diagnostic interrupt
42 I Blade_09 09/08/13 12:20:34 0x806f0013 Chassis, (NMI State) diagnostic interrupt
43 E Blade_09 09/08/13 12:20:30 0x10000002 SMI Hdlr: 00151743 HI Fatal Error, HI_FERR/NERR Value= 0020
44 I Blade_09 09/08/13 12:18:20 0x806f0013 Recovery Chassis, (NMI State) diagnostic interrupt
45 I Blade_09 09/08/13 12:18:10 0x806f0013 Chassis, (NMI State) diagnostic interrupt
46 E Blade_09 09/08/13 12:18:05 0x10000002 SMI Hdlr: 00151743 HI Fatal Error, HI_FERR/NERR Value= 0020
47 I Blade_09 09/08/13 12:15:47 0x806f0013 Recovery Chassis, (NMI State) diagnostic interrupt
48 I Blade_09 09/08/13 12:15:37 0x806f0013 Chassis, (NMI State) diagnostic interrupt
49 E Blade_09 09/08/13 12:15:32 0x10000002 SMI Hdlr: 00151743 HI Fatal Error, HI_FERR/NERR Value= 0020

2013/9/23 Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>
On Thu, Sep 12, 2013 at 02:47:39PM +0200, Trenta sis wrote:
> Hello,
>
> We need this server and we have made a downgrade to Debian Squeeze.
> I hope in a few day to have another HS20 to make some additional test, I'll
> try to get all information that you asked and send
> Sorry, one question what is  PCI SERR ? Where?

If you log in the BladeCenter webfrontend you should see logs of
each blade. Some of them are 'User XYZ logged in'. But in some cases
the are more serious ones - such an NMI or PCI SERR. If you could copy-n-paste
them it could help in figuring which PCI device is responsible for causing
the NMI.

>
> Thanks for all
>
> 2013/9/9 Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>
>
> > On Sun, Sep 08, 2013 at 04:41:02PM +0200, Trenta sis wrote:
> > >  Hello,
> > >
> > > I have the same error, server is auto rebooted during every boot with
> > > kernel XEN, HS20 with Debian Wheezy and XEN hang on and AMM managment
> > show
> > > same errors described in previous mails. With Debian wheezy wit non-xen
> > > kernel boots correcte, it seems that problems is with xen kernel
> > > Same Server HS20 with Debian Lenny+ XEN 3.2 or Debian Squeeze+XEN
> > > 4.0 working perfect
> > >
> > > Upgraded to Debian testing and unstable with same results XEN 4.1 and
> > 4.2.
> > >
> > > If you need more information, you can ask.
> > > How can be solved this bug?
> >
> > Did you the workaround help?
> >
> > And in regards to finding out exactly what causes it - well there are
> > logs in the BMC that can point to it the PCI device? Did you check those?
> > Do they save if there is any device that has PCI SERR on them?
> >
> > Thanks.
> >

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.