[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v4 06/21] IOMMU/x86: perform PV Dom0 mappings in batches
On Wed, May 04, 2022 at 11:46:37AM +0200, Jan Beulich wrote: > On 03.05.2022 16:49, Roger Pau Monné wrote: > > On Mon, Apr 25, 2022 at 10:34:59AM +0200, Jan Beulich wrote: > >> For large page mappings to be easily usable (i.e. in particular without > >> un-shattering of smaller page mappings) and for mapping operations to > >> then also be more efficient, pass batches of Dom0 memory to iommu_map(). > >> In dom0_construct_pv() and its helpers (covering strict mode) this > >> additionally requires establishing the type of those pages (albeit with > >> zero type references). > > > > I think it's possible I've already asked this. Would it make sense to > > add the IOMMU mappings in alloc_domheap_pages(), maybe by passing a > > specific flag? > > I don't think you did ask, but now that you do: This would look like a > layering violation to me. I don't think allocation should ever have > mapping (into the IOMMU or elsewhere) as a "side effect", no matter > that ... Hm, I'm certainly not that familiar with PV itself to likely be able to make a proper argument here. I fully agree with you for translated guests using a p2m. For PV we currently establish/teardown IOMMU mappings in _get_page_type(), which already looks like a layering violation to me, hence also doing so in alloc_domheap_pages() wouldn't seem that bad if it allows to simplify the resulting code overall. > > It would seem to me that doing it that way would also allow the > > mappings to get established in blocks for domUs. > > ... then this would perhaps be possible. > > >> The installing of zero-ref writable types has in fact shown (observed > >> while putting together the change) that despite the intention by the > >> XSA-288 changes (affecting DomU-s only) for Dom0 a number of > >> sufficiently ordinary pages (at the very least initrd and P2M ones as > >> well as pages that are part of the initial allocation but not part of > >> the initial mapping) still have been starting out as PGT_none, meaning > >> that they would have gained IOMMU mappings only the first time these > >> pages would get mapped writably. Consequently an open question is > >> whether iommu_memory_setup() should set the pages to PGT_writable_page > >> independent of need_iommu_pt_sync(). > > > > I think I'm confused, doesn't the setting of PGT_writable_page happen > > as a result of need_iommu_pt_sync() and having those pages added to > > the IOMMU page tables? (so they can be properly tracked and IOMMU > > mappings are removed if thte page is also removed) > > In principle yes - in guest_physmap_add_page(). But this function isn't > called for the pages I did enumerate in the remark. XSA-288 really only > cared about getting this right for DomU-s. Would it make sense to change guest_physmap_add_page() to be able to pass the page_order parameter down to iommu_map(), and then use it for dom0 build instead of introducing iommu_memory_setup()? I think guest_physmap_add_page() will need to be adjusted at some point for domUs, and hence it could be unified with dom0 usage also? > > If the pages are not added here (because dom0 is not running in strict > > mode) then setting PGT_writable_page is not required? > > Correct - in that case we skip fiddling with IOMMU mappings on > transitions to/from PGT_writable_page, and hence putting this type in > place would be benign (but improve consistency). > > >> Note also that strictly speaking the iommu_iotlb_flush_all() here (as > >> well as the pre-existing one in arch_iommu_hwdom_init()) shouldn't be > >> needed: Actual hooking up (AMD) or enabling of translation (VT-d) > >> occurs only afterwards anyway, so nothing can have made it into TLBs > >> just yet. > > > > Hm, indeed. I think the one in arch_iommu_hwdom_init can surely go > > away, as we must strictly do the hwdom init before enabling the iommu > > itself. > > Why would that be? That's imo as much of an implementation detail as > ... Well, you want to have the reserved/inclusive options applied (and mappings created) before enabling the IOMMU, because at that point devices have already been assigned. So it depends more on a combination of devices assigned & IOMMU enabled rather than just IOMMU being enabled. > > The one in dom0 build I'm less convinced, just to be on the safe side > > if we ever change the order of IOMMU init and memory setup. > > ... this. Just like we populate tables with the IOMMU already enabled > for DomU-s, I think the same would be valid to do for Dom0. > > > I would expect flushing an empty TLB to not be very expensive? > > I wouldn't "expect" this - it might be this way, but it surely depends > on whether an implementation can easily tell whether the TLB is empty, > and whether its emptiness actually makes a difference for the latency > of a flush operation. > > >> --- a/xen/drivers/passthrough/x86/iommu.c > >> +++ b/xen/drivers/passthrough/x86/iommu.c > >> @@ -347,8 +347,8 @@ static unsigned int __hwdom_init hwdom_i > >> > >> void __hwdom_init arch_iommu_hwdom_init(struct domain *d) > >> { > >> - unsigned long i, top, max_pfn; > >> - unsigned int flush_flags = 0; > >> + unsigned long i, top, max_pfn, start, count; > >> + unsigned int flush_flags = 0, start_perms = 0; > >> > >> BUG_ON(!is_hardware_domain(d)); > >> > >> @@ -379,9 +379,9 @@ void __hwdom_init arch_iommu_hwdom_init( > >> * First Mb will get mapped in one go by pvh_populate_p2m(). Avoid > >> * setting up potentially conflicting mappings here. > >> */ > >> - i = paging_mode_translate(d) ? PFN_DOWN(MB(1)) : 0; > >> + start = paging_mode_translate(d) ? PFN_DOWN(MB(1)) : 0; > >> > >> - for ( ; i < top; i++ ) > >> + for ( i = start, count = 0; i < top; ) > >> { > >> unsigned long pfn = pdx_to_pfn(i); > >> unsigned int perms = hwdom_iommu_map(d, pfn, max_pfn); > >> @@ -390,20 +390,41 @@ void __hwdom_init arch_iommu_hwdom_init( > >> if ( !perms ) > >> rc = 0; > >> else if ( paging_mode_translate(d) ) > >> + { > >> rc = p2m_add_identity_entry(d, pfn, > >> perms & IOMMUF_writable ? > >> p2m_access_rw > >> : > >> p2m_access_r, > >> 0); > >> + if ( rc ) > >> + printk(XENLOG_WARNING > >> + "%pd: identity mapping of %lx failed: %d\n", > >> + d, pfn, rc); > >> + } > >> + else if ( pfn != start + count || perms != start_perms ) > >> + { > >> + commit: > >> + rc = iommu_map(d, _dfn(start), _mfn(start), count, > >> start_perms, > >> + &flush_flags); > >> + if ( rc ) > >> + printk(XENLOG_WARNING > >> + "%pd: IOMMU identity mapping of [%lx,%lx) failed: > >> %d\n", > >> + d, pfn, pfn + count, rc); > >> + SWAP(start, pfn); > >> + start_perms = perms; > >> + count = 1; > >> + } > >> else > >> - rc = iommu_map(d, _dfn(pfn), _mfn(pfn), 1ul << PAGE_ORDER_4K, > >> - perms, &flush_flags); > >> + { > >> + ++count; > >> + rc = 0; > > > > Seeing as we want to process this in blocks now, I wonder whether it > > would make sense to take a different approach, and use a rangeset to > > track which regions need to be mapped. What gets added would be based > > on the host e820 plus the options > > iommu_hwdom_{strict,inclusive,reserved}. We would then punch holes > > based on the logic in hwdom_iommu_map() and finally we could iterate > > over the regions afterwards using rangeset_consume_ranges(). > > > > Not that you strictly need to do it here, just think the end result > > would be clearer. > > The end result might indeed be, but it would be more of a change right > here. Hence I'd prefer to leave that out of the series for now. OK. I think it might be nice to add a comment in that regard, mostly because I tend to forget myself. Thanks, Roger.
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |