[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Xen crash: map_domain_page() on an NMI path



At 19:37 +0000 on 18 Dec (1387391848), Andrew Cooper wrote:
> Hello,
> 
> This is a stack trace caught by automated testing.  The server BMC has
> indicated that it has genuinely injected an IOCK NMI (which is believed
> to be caused by a system erratum we are aware of and trying to work around)
> 
> However, the interesting point is the nested crash.  This is a failed
> assertion while attempting to execute the kexec crash path.  Xen is
> 4.3.1 based, and built with debug, so the stack trace below is generated
> with frame pointers, and is correct.
> 
> (XEN) Xen call trace:
> (XEN)    [<ffff82c4c01634ac>] __context_switch+0xb0/0x41e
> (XEN)    [<ffff82c4c016388d>] __sync_local_execstate+0x73/0x83
> (XEN)    [<ffff82c4c01638a6>] sync_local_execstate+0x9/0xb
> (XEN)    [<ffff82c4c0166789>] map_domain_page+0x98/0x5c4
> (XEN)    [<ffff82c4c0153820>] map_vtd_domain_page+0xd/0x1d
> (XEN)    [<ffff82c4c015139f>] queue_invalidate_context+0x94/0x141
> (XEN)    [<ffff82c4c0151891>] flush_context_qi+0x55/0x66
> (XEN)    [<ffff82c4c014d1ed>] iommu_flush_all+0x68/0x12f
> (XEN)    [<ffff82c4c014f770>] vtd_crash_shutdown+0x15/0x64
> (XEN)    [<ffff82c4c0149eec>] iommu_crash_shutdown+0x3f/0x4f
> (XEN)    [<ffff82c4c01a8790>] machine_crash_shutdown+0x273/0x2eb
> (XEN)    [<ffff82c4c0114af2>] kexec_crash+0x4c/0x70
> (XEN)    [<ffff82c4c01442f2>] panic+0x12c/0x15b
> (XEN)    [<ffff82c4c0190815>] fatal_trap+0xb8/0xc6
> (XEN)    [<ffff82c4c0190f1c>] do_nmi+0xf9/0x180
> (XEN)    [<ffff82c4c02366fc>] handle_ist_exception+0x92/0xf6
> (XEN)    [<ffff82c4c0167558>] write_cr3+0x6a/0x83
> (XEN)    [<ffff82c4c0176b08>] write_ptbase+0x10/0x12
> (XEN)    [<ffff82c4c016374b>] __context_switch+0x34f/0x41e
> (XEN)    [<ffff82c4c016388d>] __sync_local_execstate+0x73/0x83
> (XEN)    [<ffff82c4c01638a6>] sync_local_execstate+0x9/0xb
> (XEN)    [<ffff82c4c012df35>] do_tasklet_work+0x9d/0xeb
> (XEN)    [<ffff82c4c012e152>] tasklet_softirq_action+0x44/0x92
> (XEN)    [<ffff82c4c012b4bc>] __do_softirq+0x9f/0xb0
> (XEN)    [<ffff82c4c012b4e0>] do_softirq+0x13/0x15
> (XEN)    [<ffff82c4c01628bc>] idle_loop+0x66/0x6c
> (XEN)
> (XEN)
> (XEN) ****************************************
> (XEN) Panic on CPU 0:
> (XEN) Assertion 'cpumask_empty(n->vcpu_dirty_cpumask)' failed at
> domain.c:1321
> (XEN) ****************************************
> (XEN)
> 
> Here, we have managed to re-enter the __context_switch() path because of
> an NMI interrupting it.  The sync_local_execstate() in map_domain_page()
> is by way of mapcache_current_vcpu().
> 
> I am struggling to work out how best to fix this.  Would it be best for
> the crash path to unconditionally change to the idle_pagetables and use
> mapcache_override_current(NULL)?

I think it would be best for the iommu_crash_shutdown() path to be
made crash-safe -- after all, that code takes spinlocks too.
Presumably we can do something a bit ruder in crash code, like just
turn the IOMMUs off entirely?

Or are there other map_domain_page() ops on the crash path?  Does
kexec need it?

Tim.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.