[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Xen crash: map_domain_page() on an NMI path
At 19:37 +0000 on 18 Dec (1387391848), Andrew Cooper wrote: > Hello, > > This is a stack trace caught by automated testing. The server BMC has > indicated that it has genuinely injected an IOCK NMI (which is believed > to be caused by a system erratum we are aware of and trying to work around) > > However, the interesting point is the nested crash. This is a failed > assertion while attempting to execute the kexec crash path. Xen is > 4.3.1 based, and built with debug, so the stack trace below is generated > with frame pointers, and is correct. > > (XEN) Xen call trace: > (XEN) [<ffff82c4c01634ac>] __context_switch+0xb0/0x41e > (XEN) [<ffff82c4c016388d>] __sync_local_execstate+0x73/0x83 > (XEN) [<ffff82c4c01638a6>] sync_local_execstate+0x9/0xb > (XEN) [<ffff82c4c0166789>] map_domain_page+0x98/0x5c4 > (XEN) [<ffff82c4c0153820>] map_vtd_domain_page+0xd/0x1d > (XEN) [<ffff82c4c015139f>] queue_invalidate_context+0x94/0x141 > (XEN) [<ffff82c4c0151891>] flush_context_qi+0x55/0x66 > (XEN) [<ffff82c4c014d1ed>] iommu_flush_all+0x68/0x12f > (XEN) [<ffff82c4c014f770>] vtd_crash_shutdown+0x15/0x64 > (XEN) [<ffff82c4c0149eec>] iommu_crash_shutdown+0x3f/0x4f > (XEN) [<ffff82c4c01a8790>] machine_crash_shutdown+0x273/0x2eb > (XEN) [<ffff82c4c0114af2>] kexec_crash+0x4c/0x70 > (XEN) [<ffff82c4c01442f2>] panic+0x12c/0x15b > (XEN) [<ffff82c4c0190815>] fatal_trap+0xb8/0xc6 > (XEN) [<ffff82c4c0190f1c>] do_nmi+0xf9/0x180 > (XEN) [<ffff82c4c02366fc>] handle_ist_exception+0x92/0xf6 > (XEN) [<ffff82c4c0167558>] write_cr3+0x6a/0x83 > (XEN) [<ffff82c4c0176b08>] write_ptbase+0x10/0x12 > (XEN) [<ffff82c4c016374b>] __context_switch+0x34f/0x41e > (XEN) [<ffff82c4c016388d>] __sync_local_execstate+0x73/0x83 > (XEN) [<ffff82c4c01638a6>] sync_local_execstate+0x9/0xb > (XEN) [<ffff82c4c012df35>] do_tasklet_work+0x9d/0xeb > (XEN) [<ffff82c4c012e152>] tasklet_softirq_action+0x44/0x92 > (XEN) [<ffff82c4c012b4bc>] __do_softirq+0x9f/0xb0 > (XEN) [<ffff82c4c012b4e0>] do_softirq+0x13/0x15 > (XEN) [<ffff82c4c01628bc>] idle_loop+0x66/0x6c > (XEN) > (XEN) > (XEN) **************************************** > (XEN) Panic on CPU 0: > (XEN) Assertion 'cpumask_empty(n->vcpu_dirty_cpumask)' failed at > domain.c:1321 > (XEN) **************************************** > (XEN) > > Here, we have managed to re-enter the __context_switch() path because of > an NMI interrupting it. The sync_local_execstate() in map_domain_page() > is by way of mapcache_current_vcpu(). > > I am struggling to work out how best to fix this. Would it be best for > the crash path to unconditionally change to the idle_pagetables and use > mapcache_override_current(NULL)? I think it would be best for the iommu_crash_shutdown() path to be made crash-safe -- after all, that code takes spinlocks too. Presumably we can do something a bit ruder in crash code, like just turn the IOMMUs off entirely? Or are there other map_domain_page() ops on the crash path? Does kexec need it? Tim. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |