[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Xen-devel] Xen crash: map_domain_page() on an NMI path
Hello, This is a stack trace caught by automated testing. The server BMC has indicated that it has genuinely injected an IOCK NMI (which is believed to be caused by a system erratum we are aware of and trying to work around) However, the interesting point is the nested crash. This is a failed assertion while attempting to execute the kexec crash path. Xen is 4.3.1 based, and built with debug, so the stack trace below is generated with frame pointers, and is correct. (XEN) Xen call trace: (XEN) [<ffff82c4c01634ac>] __context_switch+0xb0/0x41e (XEN) [<ffff82c4c016388d>] __sync_local_execstate+0x73/0x83 (XEN) [<ffff82c4c01638a6>] sync_local_execstate+0x9/0xb (XEN) [<ffff82c4c0166789>] map_domain_page+0x98/0x5c4 (XEN) [<ffff82c4c0153820>] map_vtd_domain_page+0xd/0x1d (XEN) [<ffff82c4c015139f>] queue_invalidate_context+0x94/0x141 (XEN) [<ffff82c4c0151891>] flush_context_qi+0x55/0x66 (XEN) [<ffff82c4c014d1ed>] iommu_flush_all+0x68/0x12f (XEN) [<ffff82c4c014f770>] vtd_crash_shutdown+0x15/0x64 (XEN) [<ffff82c4c0149eec>] iommu_crash_shutdown+0x3f/0x4f (XEN) [<ffff82c4c01a8790>] machine_crash_shutdown+0x273/0x2eb (XEN) [<ffff82c4c0114af2>] kexec_crash+0x4c/0x70 (XEN) [<ffff82c4c01442f2>] panic+0x12c/0x15b (XEN) [<ffff82c4c0190815>] fatal_trap+0xb8/0xc6 (XEN) [<ffff82c4c0190f1c>] do_nmi+0xf9/0x180 (XEN) [<ffff82c4c02366fc>] handle_ist_exception+0x92/0xf6 (XEN) [<ffff82c4c0167558>] write_cr3+0x6a/0x83 (XEN) [<ffff82c4c0176b08>] write_ptbase+0x10/0x12 (XEN) [<ffff82c4c016374b>] __context_switch+0x34f/0x41e (XEN) [<ffff82c4c016388d>] __sync_local_execstate+0x73/0x83 (XEN) [<ffff82c4c01638a6>] sync_local_execstate+0x9/0xb (XEN) [<ffff82c4c012df35>] do_tasklet_work+0x9d/0xeb (XEN) [<ffff82c4c012e152>] tasklet_softirq_action+0x44/0x92 (XEN) [<ffff82c4c012b4bc>] __do_softirq+0x9f/0xb0 (XEN) [<ffff82c4c012b4e0>] do_softirq+0x13/0x15 (XEN) [<ffff82c4c01628bc>] idle_loop+0x66/0x6c (XEN) (XEN) (XEN) **************************************** (XEN) Panic on CPU 0: (XEN) Assertion 'cpumask_empty(n->vcpu_dirty_cpumask)' failed at domain.c:1321 (XEN) **************************************** (XEN) Here, we have managed to re-enter the __context_switch() path because of an NMI interrupting it. The sync_local_execstate() in map_domain_page() is by way of mapcache_current_vcpu(). I am struggling to work out how best to fix this. Would it be best for the crash path to unconditionally change to the idle_pagetables and use mapcache_override_current(NULL)? ~Andrew _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |