[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [RFC PATCH] iommu: make no-quarantine mean no-quarantine
On 4/28/21, 3:20 AM, Paul Durrant wrote: >> Following the extension to the command line option I'm putting in place >> in "IOMMU: make DMA containment of quarantined devices optional" (which >> I still need to get around to address review feedback for and resubmit), >> I'd be inclined to suggest "iommu=quarantine=always" or >> "iommu=quarantine=on-assign". Unless of course we'd prefer to have the >> caller of the assignment operation have full control over the behavior >> here anyway (in which case a command line option control simply is not >> necessary). > > I'm still not entirely sure why not quarantining on is a problem, other > than it triggering an as-yet undiagnosed issue in QEMU, but I agree that > that the expectation of 'no-quarantine' meaning just that (i.e. the old > dom0->domU and domU->dom0 transitions are re-instated) is reasonable. Do > we really want yet more command line options? Regarding the problem in QEMU, I traced the crash trigger down to a write to the IQ tail register during the mapping operation into dom_io (backtrace below). Along the way I noticed that, since a non-present entry was being flushed, flush_context_qi only performs this invalidation on an IOMMU with caching mode enabled (i.e. a software IOMMU). Therefore this issue is probably only hittable when nesting. Disabling caching mode on the QEMU vIOMMU was enough to prevent the crash and give me a working system. (gdb) si 0xffff82d04025b68b 72 in qinval.c 0xffff82d04025b687 <qinval_update_qtail+43>: ... shl $0x4,%r12 => 0xffff82d04025b68b <qinval_update_qtail+47>: ... mov %r12,0x88(%rax) (gdb) bt #0 0xffff82d04025b68b in qinval_update_qtail (...) at qinval.c:72 #1 0xffff82d04025baa7 in queue_invalidate_context_sync (...) at qinval.c:101 #2 flush_context_qi (...) at qinval.c:341 #3 0xffff82d040259125 in iommu_flush_context_device (...) at iommu.c:400 #4 domain_context_mapping_one (...) at iommu.c:1436 #5 0xffff82d040259351 in domain_context_mapping (...) at iommu.c:1510 #6 0xffff82d040259d20 in reassign_device_ownership (...) at iommu.c:2412 #7 0xffff82d040259f19 in intel_iommu_assign_device (...) at iommu.c:2476 #8 0xffff82d040267154 in assign_device (...) at pci.c:1545 #9 iommu_do_pci_domctl (...) at pci.c:1732 #10 0xffff82d040264de3 in iommu_do_domctl (...) at iommu.c:539 #11 0xffff82d040322ca5 in arch_do_domctl (...) at domctl.c:1496 #12 0xffff82d040205a19 in do_domctl (...) at domctl.c:956 #13 0xffff82d040319476 in pv_hypercall (...) at hypercall.c:155 #14 0xffff82d040390432 in lstar_enter () at entry.S:271 #15 0x0000000000000000 in ?? () As a result of the above, I no longer have a need to patch Xen to work around the problem. Though I do want to test against newer versions of QEMU (currently on 4.2.1) to see if it still exists. So unless there's interest among Xen developers for this patch, I will probably shelve it for now. Especially since it looks like Jan has some ongoing work in this area that I had not previously discovered. If there is interest, I just need a resolution on whether iommu=quarantine should be left as a boolean or expanded to support always, never, deassign-only, and (why not) assign-only. Thanks, Scott
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |