[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH] x86/ept: simplify detection of special pages for EMT calculation
On Mon, Sep 26, 2022 at 10:38:40AM +0200, Jan Beulich wrote: > On 23.09.2022 12:56, Roger Pau Monne wrote: > > The current way to detect whether a page handled to > > epte_get_entry_emt() is special and needs a forced write-back cache > > attribute involves iterating over all the smaller 4K pages for > > superpages. > > > > Such loop consumes a high amount of CPU time for 1GiB pages (order > > 18): on a Xeon® Silver 4216 (Cascade Lake) at 2GHz this takes an > > average amount of time of 1.5ms. Note that this figure just accounts > > for the is_special_page() loop, and not the whole code of > > epte_get_entry_emt(). Also the resolve_misconfig() operation that > > calls into epte_get_entry_emt() is done while holding the p2m lock in > > write (exclusive) mode, which blocks concurrent EPT_MISCONFIG faults > > and prevents most guest hypercalls for progressing due to the need to > > take the p2m lock in read mode to access any guest provided hypercall > > buffers. > > > > Simplify the checking in epte_get_entry_emt() and remove the loop, > > assuming that there won't be superpages being only partially special. > > > > So far we have no special superpages added to the guest p2m, > > We may not be adding them as superpages, but what a guest makes of > the pages it is given access to for e.g. grant handling, or what Dom0 > makes of e.g. the (per-CPU) trace buffers is unknown. And I guess > Dom0 ending up with a non-WB mapping of the trace buffers might > impact tracing quite a bit. I don't think we can build on guests not > making any such the subject of a large-range mapping attempt, which > might end up suitable for a superpage mapping (recall that rather > sooner than later we ought to finally re-combine suitable ranges of > contiguous 4k mappings into 2M ones, just like we [now] do in IOMMU > code). Hm, doesn't pages used for grant handling (XENMAPSPACE_grant_table) cause them to be mapped as 4K entries in the p2m page tables. The code in xenmem_add_to_physmap_one() seems to remove and re-add them with order 0. Same with the trace buffers, they are added as order 0 to the p2m. Note that when coalescing we would need to be careful then to not coalesce special pages. Might not be the best model because I'm not sure why we require XENMAPSPACE_grant_table to force entries to not be mapped as part of a super page in the guest p2m. > Since for data structures like the ones named above 2M mappings > might be enough (i.e. there might be little "risk" of even needing to > go to 1G ones), could we maybe take a "middle" approach and check all > pages when order == 9, but use your approach for higher orders? The > to-be-added re-coalescing would then need to by taught to refuse re- > coalescing of such ranges to larger than 2M mappings, while still > at least allowing for 2M ones. (Special casing at that boundary is > going to be necessary also for shadow code, at the very least.) But > see also below as to caveats. I guess a rangeset would be more future proof than anything else. > > and in > > any case the forcing of the write-back cache attribute is a courtesy > > to the guest to avoid such ranges being accessed as uncached when not > > really needed. It's not acceptable for such assistance to tax the > > system so badly. > > I agree we would better improve the situation, but I don't think we > can do so by ... > > > @@ -518,26 +517,19 @@ int epte_get_entry_emt(struct domain *d, gfn_t gfn, > > mfn_t mfn, > > return MTRR_TYPE_UNCACHABLE; > > } > > > > - if ( type != p2m_mmio_direct && !is_iommu_enabled(d) && > > - !cache_flush_permitted(d) ) > > + if ( (type != p2m_mmio_direct && !is_iommu_enabled(d) && > > + !cache_flush_permitted(d)) || > > + /* > > + * Assume the whole page to be special if the first 4K chunk is: > > + * iterating over all possible 4K sub-pages for higher order > > pages is > > + * too expensive. > > + */ > > + is_special_page(mfn_to_page(mfn)) ) > > ... building in assumptions like this one. The more that here you may > also produce too weak a memory type (think of a later page in the range > requiring a stronger-ordered memory type). > > While it may not help much, ... > > > { > > *ipat = true; > > return MTRR_TYPE_WRBACK; > > } > > > > - for ( special_pgs = i = 0; i < (1ul << order); i++ ) > > - if ( is_special_page(mfn_to_page(mfn_add(mfn, i))) ) > > - special_pgs++; > > - > > - if ( special_pgs ) > > - { > > - if ( special_pgs != (1ul << order) ) > > - return -1; > > - > > - *ipat = true; > > - return MTRR_TYPE_WRBACK; > > - } > > ... this logic could be improved to at least bail from the loop once it's > clear that the "-1" return path will be taken. Improvements beyond that > would likely involve adding some data structure (rangeset?) to track > special pages. For the guest I was running the loop didn't find any special pages in order 18 mappings, which are the most troublesome to handle in the loop. I'm not sure bailing early would make that much of a difference in practice TBH. I did also consider using a rangeset, but that would use more per-domain memory and also require coordination to add special pages to it. Thanks, Roger.
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |