[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH] x86/ept: simplify detection of special pages for EMT calculation
On 26.09.2022 16:27, Roger Pau Monné wrote: > On Mon, Sep 26, 2022 at 10:38:40AM +0200, Jan Beulich wrote: >> On 23.09.2022 12:56, Roger Pau Monne wrote: >>> The current way to detect whether a page handled to >>> epte_get_entry_emt() is special and needs a forced write-back cache >>> attribute involves iterating over all the smaller 4K pages for >>> superpages. >>> >>> Such loop consumes a high amount of CPU time for 1GiB pages (order >>> 18): on a Xeon® Silver 4216 (Cascade Lake) at 2GHz this takes an >>> average amount of time of 1.5ms. Note that this figure just accounts >>> for the is_special_page() loop, and not the whole code of >>> epte_get_entry_emt(). Also the resolve_misconfig() operation that >>> calls into epte_get_entry_emt() is done while holding the p2m lock in >>> write (exclusive) mode, which blocks concurrent EPT_MISCONFIG faults >>> and prevents most guest hypercalls for progressing due to the need to >>> take the p2m lock in read mode to access any guest provided hypercall >>> buffers. >>> >>> Simplify the checking in epte_get_entry_emt() and remove the loop, >>> assuming that there won't be superpages being only partially special. >>> >>> So far we have no special superpages added to the guest p2m, >> >> We may not be adding them as superpages, but what a guest makes of >> the pages it is given access to for e.g. grant handling, or what Dom0 >> makes of e.g. the (per-CPU) trace buffers is unknown. And I guess >> Dom0 ending up with a non-WB mapping of the trace buffers might >> impact tracing quite a bit. I don't think we can build on guests not >> making any such the subject of a large-range mapping attempt, which >> might end up suitable for a superpage mapping (recall that rather >> sooner than later we ought to finally re-combine suitable ranges of >> contiguous 4k mappings into 2M ones, just like we [now] do in IOMMU >> code). > > Hm, doesn't pages used for grant handling (XENMAPSPACE_grant_table) > cause them to be mapped as 4K entries in the p2m page tables. The > code in xenmem_add_to_physmap_one() seems to remove and re-add them > with order 0. Same with the trace buffers, they are added as order 0 > to the p2m. Indeed. I was half way through writing the earlier response when recalling that aspect; I may not have succeeded in adjusting the text to properly convey that the concern is applicable only to future code, not what we have right now. > Note that when coalescing we would need to be careful then to not > coalesce special pages. Well, no, ... > Might not be the best model because I'm not sure why we require > XENMAPSPACE_grant_table to force entries to not be mapped as part of a > super page in the guest p2m. ... as you say here, there may actually be benefits from allowing such (re-)coalescing. >> Since for data structures like the ones named above 2M mappings >> might be enough (i.e. there might be little "risk" of even needing to >> go to 1G ones), could we maybe take a "middle" approach and check all >> pages when order == 9, but use your approach for higher orders? The >> to-be-added re-coalescing would then need to by taught to refuse re- >> coalescing of such ranges to larger than 2M mappings, while still >> at least allowing for 2M ones. (Special casing at that boundary is >> going to be necessary also for shadow code, at the very least.) But >> see also below as to caveats. > > I guess a rangeset would be more future proof than anything else. > >>> and in >>> any case the forcing of the write-back cache attribute is a courtesy >>> to the guest to avoid such ranges being accessed as uncached when not >>> really needed. It's not acceptable for such assistance to tax the >>> system so badly. >> >> I agree we would better improve the situation, but I don't think we >> can do so by ... >> >>> @@ -518,26 +517,19 @@ int epte_get_entry_emt(struct domain *d, gfn_t gfn, >>> mfn_t mfn, >>> return MTRR_TYPE_UNCACHABLE; >>> } >>> >>> - if ( type != p2m_mmio_direct && !is_iommu_enabled(d) && >>> - !cache_flush_permitted(d) ) >>> + if ( (type != p2m_mmio_direct && !is_iommu_enabled(d) && >>> + !cache_flush_permitted(d)) || >>> + /* >>> + * Assume the whole page to be special if the first 4K chunk is: >>> + * iterating over all possible 4K sub-pages for higher order >>> pages is >>> + * too expensive. >>> + */ >>> + is_special_page(mfn_to_page(mfn)) ) >> >> ... building in assumptions like this one. The more that here you may >> also produce too weak a memory type (think of a later page in the range >> requiring a stronger-ordered memory type). >> >> While it may not help much, ... >> >>> { >>> *ipat = true; >>> return MTRR_TYPE_WRBACK; >>> } >>> >>> - for ( special_pgs = i = 0; i < (1ul << order); i++ ) >>> - if ( is_special_page(mfn_to_page(mfn_add(mfn, i))) ) >>> - special_pgs++; >>> - >>> - if ( special_pgs ) >>> - { >>> - if ( special_pgs != (1ul << order) ) >>> - return -1; >>> - >>> - *ipat = true; >>> - return MTRR_TYPE_WRBACK; >>> - } >> >> ... this logic could be improved to at least bail from the loop once it's >> clear that the "-1" return path will be taken. Improvements beyond that >> would likely involve adding some data structure (rangeset?) to track >> special pages. > > For the guest I was running the loop didn't find any special pages in > order 18 mappings, which are the most troublesome to handle in the > loop. I'm not sure bailing early would make that much of a difference > in practice TBH. As said - it may not help much. > I did also consider using a rangeset, but that would use more > per-domain memory and also require coordination to add special pages > to it. "Special" is a global property, so I don't think this would need to be a per-domain data structure. Jan
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |