[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] Xen crash: map_domain_page() on an NMI path



Hello,

This is a stack trace caught by automated testing.  The server BMC has
indicated that it has genuinely injected an IOCK NMI (which is believed
to be caused by a system erratum we are aware of and trying to work around)

However, the interesting point is the nested crash.  This is a failed
assertion while attempting to execute the kexec crash path.  Xen is
4.3.1 based, and built with debug, so the stack trace below is generated
with frame pointers, and is correct.

(XEN) Xen call trace:
(XEN)    [<ffff82c4c01634ac>] __context_switch+0xb0/0x41e
(XEN)    [<ffff82c4c016388d>] __sync_local_execstate+0x73/0x83
(XEN)    [<ffff82c4c01638a6>] sync_local_execstate+0x9/0xb
(XEN)    [<ffff82c4c0166789>] map_domain_page+0x98/0x5c4
(XEN)    [<ffff82c4c0153820>] map_vtd_domain_page+0xd/0x1d
(XEN)    [<ffff82c4c015139f>] queue_invalidate_context+0x94/0x141
(XEN)    [<ffff82c4c0151891>] flush_context_qi+0x55/0x66
(XEN)    [<ffff82c4c014d1ed>] iommu_flush_all+0x68/0x12f
(XEN)    [<ffff82c4c014f770>] vtd_crash_shutdown+0x15/0x64
(XEN)    [<ffff82c4c0149eec>] iommu_crash_shutdown+0x3f/0x4f
(XEN)    [<ffff82c4c01a8790>] machine_crash_shutdown+0x273/0x2eb
(XEN)    [<ffff82c4c0114af2>] kexec_crash+0x4c/0x70
(XEN)    [<ffff82c4c01442f2>] panic+0x12c/0x15b
(XEN)    [<ffff82c4c0190815>] fatal_trap+0xb8/0xc6
(XEN)    [<ffff82c4c0190f1c>] do_nmi+0xf9/0x180
(XEN)    [<ffff82c4c02366fc>] handle_ist_exception+0x92/0xf6
(XEN)    [<ffff82c4c0167558>] write_cr3+0x6a/0x83
(XEN)    [<ffff82c4c0176b08>] write_ptbase+0x10/0x12
(XEN)    [<ffff82c4c016374b>] __context_switch+0x34f/0x41e
(XEN)    [<ffff82c4c016388d>] __sync_local_execstate+0x73/0x83
(XEN)    [<ffff82c4c01638a6>] sync_local_execstate+0x9/0xb
(XEN)    [<ffff82c4c012df35>] do_tasklet_work+0x9d/0xeb
(XEN)    [<ffff82c4c012e152>] tasklet_softirq_action+0x44/0x92
(XEN)    [<ffff82c4c012b4bc>] __do_softirq+0x9f/0xb0
(XEN)    [<ffff82c4c012b4e0>] do_softirq+0x13/0x15
(XEN)    [<ffff82c4c01628bc>] idle_loop+0x66/0x6c
(XEN)
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 0:
(XEN) Assertion 'cpumask_empty(n->vcpu_dirty_cpumask)' failed at
domain.c:1321
(XEN) ****************************************
(XEN)

Here, we have managed to re-enter the __context_switch() path because of
an NMI interrupting it.  The sync_local_execstate() in map_domain_page()
is by way of mapcache_current_vcpu().

I am struggling to work out how best to fix this.  Would it be best for
the crash path to unconditionally change to the idle_pagetables and use
mapcache_override_current(NULL)?

~Andrew


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.