[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v4 06/21] IOMMU/x86: perform PV Dom0 mappings in batches
On 03.05.2022 16:49, Roger Pau Monné wrote: > On Mon, Apr 25, 2022 at 10:34:59AM +0200, Jan Beulich wrote: >> For large page mappings to be easily usable (i.e. in particular without >> un-shattering of smaller page mappings) and for mapping operations to >> then also be more efficient, pass batches of Dom0 memory to iommu_map(). >> In dom0_construct_pv() and its helpers (covering strict mode) this >> additionally requires establishing the type of those pages (albeit with >> zero type references). > > I think it's possible I've already asked this. Would it make sense to > add the IOMMU mappings in alloc_domheap_pages(), maybe by passing a > specific flag? I don't think you did ask, but now that you do: This would look like a layering violation to me. I don't think allocation should ever have mapping (into the IOMMU or elsewhere) as a "side effect", no matter that ... > It would seem to me that doing it that way would also allow the > mappings to get established in blocks for domUs. ... then this would perhaps be possible. >> The installing of zero-ref writable types has in fact shown (observed >> while putting together the change) that despite the intention by the >> XSA-288 changes (affecting DomU-s only) for Dom0 a number of >> sufficiently ordinary pages (at the very least initrd and P2M ones as >> well as pages that are part of the initial allocation but not part of >> the initial mapping) still have been starting out as PGT_none, meaning >> that they would have gained IOMMU mappings only the first time these >> pages would get mapped writably. Consequently an open question is >> whether iommu_memory_setup() should set the pages to PGT_writable_page >> independent of need_iommu_pt_sync(). > > I think I'm confused, doesn't the setting of PGT_writable_page happen > as a result of need_iommu_pt_sync() and having those pages added to > the IOMMU page tables? (so they can be properly tracked and IOMMU > mappings are removed if thte page is also removed) In principle yes - in guest_physmap_add_page(). But this function isn't called for the pages I did enumerate in the remark. XSA-288 really only cared about getting this right for DomU-s. > If the pages are not added here (because dom0 is not running in strict > mode) then setting PGT_writable_page is not required? Correct - in that case we skip fiddling with IOMMU mappings on transitions to/from PGT_writable_page, and hence putting this type in place would be benign (but improve consistency). >> Note also that strictly speaking the iommu_iotlb_flush_all() here (as >> well as the pre-existing one in arch_iommu_hwdom_init()) shouldn't be >> needed: Actual hooking up (AMD) or enabling of translation (VT-d) >> occurs only afterwards anyway, so nothing can have made it into TLBs >> just yet. > > Hm, indeed. I think the one in arch_iommu_hwdom_init can surely go > away, as we must strictly do the hwdom init before enabling the iommu > itself. Why would that be? That's imo as much of an implementation detail as ... > The one in dom0 build I'm less convinced, just to be on the safe side > if we ever change the order of IOMMU init and memory setup. ... this. Just like we populate tables with the IOMMU already enabled for DomU-s, I think the same would be valid to do for Dom0. > I would expect flushing an empty TLB to not be very expensive? I wouldn't "expect" this - it might be this way, but it surely depends on whether an implementation can easily tell whether the TLB is empty, and whether its emptiness actually makes a difference for the latency of a flush operation. >> --- a/xen/drivers/passthrough/x86/iommu.c >> +++ b/xen/drivers/passthrough/x86/iommu.c >> @@ -347,8 +347,8 @@ static unsigned int __hwdom_init hwdom_i >> >> void __hwdom_init arch_iommu_hwdom_init(struct domain *d) >> { >> - unsigned long i, top, max_pfn; >> - unsigned int flush_flags = 0; >> + unsigned long i, top, max_pfn, start, count; >> + unsigned int flush_flags = 0, start_perms = 0; >> >> BUG_ON(!is_hardware_domain(d)); >> >> @@ -379,9 +379,9 @@ void __hwdom_init arch_iommu_hwdom_init( >> * First Mb will get mapped in one go by pvh_populate_p2m(). Avoid >> * setting up potentially conflicting mappings here. >> */ >> - i = paging_mode_translate(d) ? PFN_DOWN(MB(1)) : 0; >> + start = paging_mode_translate(d) ? PFN_DOWN(MB(1)) : 0; >> >> - for ( ; i < top; i++ ) >> + for ( i = start, count = 0; i < top; ) >> { >> unsigned long pfn = pdx_to_pfn(i); >> unsigned int perms = hwdom_iommu_map(d, pfn, max_pfn); >> @@ -390,20 +390,41 @@ void __hwdom_init arch_iommu_hwdom_init( >> if ( !perms ) >> rc = 0; >> else if ( paging_mode_translate(d) ) >> + { >> rc = p2m_add_identity_entry(d, pfn, >> perms & IOMMUF_writable ? >> p2m_access_rw >> : >> p2m_access_r, >> 0); >> + if ( rc ) >> + printk(XENLOG_WARNING >> + "%pd: identity mapping of %lx failed: %d\n", >> + d, pfn, rc); >> + } >> + else if ( pfn != start + count || perms != start_perms ) >> + { >> + commit: >> + rc = iommu_map(d, _dfn(start), _mfn(start), count, start_perms, >> + &flush_flags); >> + if ( rc ) >> + printk(XENLOG_WARNING >> + "%pd: IOMMU identity mapping of [%lx,%lx) failed: >> %d\n", >> + d, pfn, pfn + count, rc); >> + SWAP(start, pfn); >> + start_perms = perms; >> + count = 1; >> + } >> else >> - rc = iommu_map(d, _dfn(pfn), _mfn(pfn), 1ul << PAGE_ORDER_4K, >> - perms, &flush_flags); >> + { >> + ++count; >> + rc = 0; > > Seeing as we want to process this in blocks now, I wonder whether it > would make sense to take a different approach, and use a rangeset to > track which regions need to be mapped. What gets added would be based > on the host e820 plus the options > iommu_hwdom_{strict,inclusive,reserved}. We would then punch holes > based on the logic in hwdom_iommu_map() and finally we could iterate > over the regions afterwards using rangeset_consume_ranges(). > > Not that you strictly need to do it here, just think the end result > would be clearer. The end result might indeed be, but it would be more of a change right here. Hence I'd prefer to leave that out of the series for now. Jan
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |