[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [BUG] xen-mceinj tool testing cause dom0 crash



> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@xxxxxxxx]
> Sent: Wednesday, May 24, 2017 2:25 PM
> To: Hao, Xudong <xudong.hao@xxxxxxxxx>
> Cc: Julien Grall <julien.grall@xxxxxxx>; George Dunlap
> <George.Dunlap@xxxxxxxxxx>; Lars Kurth <lars.kurth@xxxxxxxxxx>; Zhang,
> Haozhong <haozhong.zhang@xxxxxxxxx>; xen-devel@xxxxxxxxxxxxx
> Subject: RE: [Xen-devel] [BUG] xen-mceinj tool testing cause dom0 crash
> 
> >>> On 24.05.17 at 07:32, <xudong.hao@xxxxxxxxx> wrote:
> >>  -----Original Message-----
> >> From: Xen-devel [mailto:xen-devel-bounces@xxxxxxxxxxxxx] On Behalf Of
> >> Hao, Xudong
> >> Sent: Tuesday, May 23, 2017 5:34 PM
> >> To: Jan Beulich <JBeulich@xxxxxxxx>
> >> Cc: Lars Kurth <lars.kurth@xxxxxxxxxx>; Julien Grall
> >> <julien.grall@xxxxxxx>; George Dunlap <George.Dunlap@xxxxxxxxxx>;
> >> Zhang, Haozhong <haozhong.zhang@xxxxxxxxx>; xen-devel@xxxxxxxxxxxxx
> >> Subject: Re: [Xen-devel] [BUG] xen-mceinj tool testing cause dom0
> >> crash
> >>
> >> > -----Original Message-----
> >> > From: Xen-devel [mailto:xen-devel-bounces@xxxxxxxxxxxxx] On Behalf
> >> > Of Jan Beulich
> >> > Sent: Tuesday, May 23, 2017 12:06 AM
> >> > To: Hao, Xudong <xudong.hao@xxxxxxxxx>
> >> > Cc: Lars Kurth <lars.kurth@xxxxxxxxxx>; Julien Grall
> >> > <julien.grall@xxxxxxx>; xen-devel@xxxxxxxxxxxxx; George Dunlap
> >> > <George.Dunlap@xxxxxxxxxx>; Zhang, Haozhong
> >> > <haozhong.zhang@xxxxxxxxx>
> >> > Subject: Re: [Xen-devel] [BUG] xen-mceinj tool testing cause dom0
> >> > crash
> >> >
> >> > >>> On 22.05.17 at 10:39, <xudong.hao@xxxxxxxxx> wrote:
> >> > > (XEN) Hardware Dom0 crashed: rebooting machine in 5 seconds.
> >> >
> >> > Not this - Xen is unavoidably going to go down in such a case, yet
> >> > your log has no hint at all what kind of problem Dom0 experienced
> >> > (e.g. whether one of the injected #MC-s caused this).
> >> >
> >>
> >> Jan,
> >> The first mail attached the complete log from Xen booting, hope there
> >> is
> > some
> >> hint from the full log.
> >>
> >> > > (XEN) ----[ Xen-4.9-rc  x86_64  debug=y   Tainted: MCE  ]----
> >> > > (XEN) CPU:    0
> >> > > (XEN) RIP:    e008:[<0000000065eb1e13>] 0000000065eb1e13
> >> > > ...
> >> > > (XEN) Pagetable walk from 00000000682ab009:
> >> > > (XEN)  L4[0x000] = 000000102c961063 ffffffffffffffff
> >> > > (XEN)  L3[0x001] = 000000005f812063 ffffffffffffffff
> >> > > (XEN)  L2[0x141] = 0000000000000000 ffffffffffffffff
> >> >
> >> > Here you're apparently hitting a firmware bug: While RIP points
> >> > into runtime services memory, CR2 doesn't:
> >> >
> >> > (XEN)  0000065eb8000-00000682acfff type=0 attr=000000000000000f
> >> >
> >> > You may try working around this via one of "reboot=acpi" or
> >> > "efi=no-rs" on the hypervisor command line.
> >> >
> >>
> >> Will try them.
> >>
> >
> > Neither "reboot=acpi" nor "efi=no-rs" can work around this issue.
> 
> Apparently I didn't express myself clearly enough: These workarounds were
> supposed to help with the Xen crash, not the Dom0 one. And as your logs prove
> they did fulfill that purpose. Yet still there are no Dom0 log messages at 
> all near
> the crash, which leaves open whether there is a completely silent path in its 
> MCE
> handling, or whether some messages simply don't make it through. Right now I
> can't see any Xen side of the issue here though, so from a 4.9 perspective I 
> think
> we're fine.
> 

We figured out the problem, some corner scripts triggered the error injection 
at the same page (pfn 0x180020) twice, i.e. "./xen-mceinj -t 0" run over one 
time, which resulted in Dom0 crash.

Let's close this bug thread, sorry for the invalid report and thanks Jan's 
analysis.


Thanks,
-Xudong


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.