[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Xen crash: map_domain_page() on an NMI path
On 19/12/13 11:00, Tim Deegan wrote: > At 19:37 +0000 on 18 Dec (1387391848), Andrew Cooper wrote: >> Hello, >> >> This is a stack trace caught by automated testing. The server BMC has >> indicated that it has genuinely injected an IOCK NMI (which is believed >> to be caused by a system erratum we are aware of and trying to work around) >> >> However, the interesting point is the nested crash. This is a failed >> assertion while attempting to execute the kexec crash path. Xen is >> 4.3.1 based, and built with debug, so the stack trace below is generated >> with frame pointers, and is correct. >> >> (XEN) Xen call trace: >> (XEN) [<ffff82c4c01634ac>] __context_switch+0xb0/0x41e >> (XEN) [<ffff82c4c016388d>] __sync_local_execstate+0x73/0x83 >> (XEN) [<ffff82c4c01638a6>] sync_local_execstate+0x9/0xb >> (XEN) [<ffff82c4c0166789>] map_domain_page+0x98/0x5c4 >> (XEN) [<ffff82c4c0153820>] map_vtd_domain_page+0xd/0x1d >> (XEN) [<ffff82c4c015139f>] queue_invalidate_context+0x94/0x141 >> (XEN) [<ffff82c4c0151891>] flush_context_qi+0x55/0x66 >> (XEN) [<ffff82c4c014d1ed>] iommu_flush_all+0x68/0x12f >> (XEN) [<ffff82c4c014f770>] vtd_crash_shutdown+0x15/0x64 >> (XEN) [<ffff82c4c0149eec>] iommu_crash_shutdown+0x3f/0x4f >> (XEN) [<ffff82c4c01a8790>] machine_crash_shutdown+0x273/0x2eb >> (XEN) [<ffff82c4c0114af2>] kexec_crash+0x4c/0x70 >> (XEN) [<ffff82c4c01442f2>] panic+0x12c/0x15b >> (XEN) [<ffff82c4c0190815>] fatal_trap+0xb8/0xc6 >> (XEN) [<ffff82c4c0190f1c>] do_nmi+0xf9/0x180 >> (XEN) [<ffff82c4c02366fc>] handle_ist_exception+0x92/0xf6 >> (XEN) [<ffff82c4c0167558>] write_cr3+0x6a/0x83 >> (XEN) [<ffff82c4c0176b08>] write_ptbase+0x10/0x12 >> (XEN) [<ffff82c4c016374b>] __context_switch+0x34f/0x41e >> (XEN) [<ffff82c4c016388d>] __sync_local_execstate+0x73/0x83 >> (XEN) [<ffff82c4c01638a6>] sync_local_execstate+0x9/0xb >> (XEN) [<ffff82c4c012df35>] do_tasklet_work+0x9d/0xeb >> (XEN) [<ffff82c4c012e152>] tasklet_softirq_action+0x44/0x92 >> (XEN) [<ffff82c4c012b4bc>] __do_softirq+0x9f/0xb0 >> (XEN) [<ffff82c4c012b4e0>] do_softirq+0x13/0x15 >> (XEN) [<ffff82c4c01628bc>] idle_loop+0x66/0x6c >> (XEN) >> (XEN) >> (XEN) **************************************** >> (XEN) Panic on CPU 0: >> (XEN) Assertion 'cpumask_empty(n->vcpu_dirty_cpumask)' failed at >> domain.c:1321 >> (XEN) **************************************** >> (XEN) >> >> Here, we have managed to re-enter the __context_switch() path because of >> an NMI interrupting it. The sync_local_execstate() in map_domain_page() >> is by way of mapcache_current_vcpu(). >> >> I am struggling to work out how best to fix this. Would it be best for >> the crash path to unconditionally change to the idle_pagetables and use >> mapcache_override_current(NULL)? > I think it would be best for the iommu_crash_shutdown() path to be > made crash-safe -- after all, that code takes spinlocks too. > Presumably we can do something a bit ruder in crash code, like just > turn the IOMMUs off entirely? > > Or are there other map_domain_page() ops on the crash path? Does > kexec need it? > > Tim. I don't believe we can safely just disable the IOMMU without tearing it down in a sensible fashion. Having said that, we certainly should try and make the crash path as "crash safe" as possible. I don't think it is reasonable to prevent the use of map_domain_page() on codepaths in the crash path (as being too invasive), but the mapcache_override_current(NULL) is an override which prevents any playing with the pagetables, with the caveat that mfn_to_virt(mfn) needs to work for all mfn's in the current set of pagetables. ~Andrew _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |