[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [BUG] xen-mceinj tool testing cause dom0 crash



>>> On 24.05.17 at 07:32, <xudong.hao@xxxxxxxxx> wrote:
>>  -----Original Message-----
>> From: Xen-devel [mailto:xen-devel-bounces@xxxxxxxxxxxxx] On Behalf Of Hao,
>> Xudong
>> Sent: Tuesday, May 23, 2017 5:34 PM
>> To: Jan Beulich <JBeulich@xxxxxxxx>
>> Cc: Lars Kurth <lars.kurth@xxxxxxxxxx>; Julien Grall <julien.grall@xxxxxxx>;
>> George Dunlap <George.Dunlap@xxxxxxxxxx>; Zhang, Haozhong
>> <haozhong.zhang@xxxxxxxxx>; xen-devel@xxxxxxxxxxxxx 
>> Subject: Re: [Xen-devel] [BUG] xen-mceinj tool testing cause dom0 crash
>> 
>> > -----Original Message-----
>> > From: Xen-devel [mailto:xen-devel-bounces@xxxxxxxxxxxxx] On Behalf Of
>> > Jan Beulich
>> > Sent: Tuesday, May 23, 2017 12:06 AM
>> > To: Hao, Xudong <xudong.hao@xxxxxxxxx>
>> > Cc: Lars Kurth <lars.kurth@xxxxxxxxxx>; Julien Grall
>> > <julien.grall@xxxxxxx>; xen-devel@xxxxxxxxxxxxx; George Dunlap
>> > <George.Dunlap@xxxxxxxxxx>; Zhang, Haozhong <haozhong.zhang@xxxxxxxxx>
>> > Subject: Re: [Xen-devel] [BUG] xen-mceinj tool testing cause dom0
>> > crash
>> >
>> > >>> On 22.05.17 at 10:39, <xudong.hao@xxxxxxxxx> wrote:
>> > > (XEN) Hardware Dom0 crashed: rebooting machine in 5 seconds.
>> >
>> > Not this - Xen is unavoidably going to go down in such a case, yet
>> > your log has no hint at all what kind of problem Dom0 experienced
>> > (e.g. whether one of the injected #MC-s caused this).
>> >
>> 
>> Jan,
>> The first mail attached the complete log from Xen booting, hope there is 
> some
>> hint from the full log.
>> 
>> > > (XEN) ----[ Xen-4.9-rc  x86_64  debug=y   Tainted: MCE  ]----
>> > > (XEN) CPU:    0
>> > > (XEN) RIP:    e008:[<0000000065eb1e13>] 0000000065eb1e13
>> > > ...
>> > > (XEN) Pagetable walk from 00000000682ab009:
>> > > (XEN)  L4[0x000] = 000000102c961063 ffffffffffffffff
>> > > (XEN)  L3[0x001] = 000000005f812063 ffffffffffffffff
>> > > (XEN)  L2[0x141] = 0000000000000000 ffffffffffffffff
>> >
>> > Here you're apparently hitting a firmware bug: While RIP points into
>> > runtime services memory, CR2 doesn't:
>> >
>> > (XEN)  0000065eb8000-00000682acfff type=0 attr=000000000000000f
>> >
>> > You may try working around this via one of "reboot=acpi" or
>> > "efi=no-rs" on the hypervisor command line.
>> >
>> 
>> Will try them.
>> 
> 
> Neither "reboot=acpi" nor "efi=no-rs" can work around this issue.

Apparently I didn't express myself clearly enough: These
workarounds were supposed to help with the Xen crash, not
the Dom0 one. And as your logs prove they did fulfill that
purpose. Yet still there are no Dom0 log messages at all near
the crash, which leaves open whether there is a completely
silent path in its MCE handling, or whether some messages
simply don't make it through. Right now I can't see any Xen
side of the issue here though, so from a 4.9 perspective I
think we're fine.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.