Xen project Mailing List

Re: [PATCH] x86/ept: simplify detection of special pages for EMT calculation

From: Roger Pau Monné <roger.pau@xxxxxxxxxx>

Date: Mon, 26 Sep 2022 16:27:27 +0200

Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none

Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=RU0sTQncLwtSnqNm1sz3NZyzrxReA0bmFTj0hr0dDr8=; b=IpfvToORarrBJchbDGgqVYYiNWFlrmmkCm8pt+U3GUD7hkrZb+revbkfNMF9MVdBrJylsYt/PZkBfz8pYYmqaoZfq2FVKcZG2guhYSFJ8rriiAssGePx+j2aCFOpwcYBX7Hvgj1KMyeIR0hxtYYPkhd423gnGG3DNsZsGT/DuZdkPlY+hquMZXXmcJiOsnVLMKr8YJmRMds4sOK7zi7pX0EWJ3TCvmvUL51E+xoumy56OovgZ90NFb/EWvIQNSFZ2yHRJkP6rbf6HvsnxY9uZnkbJIjuJbvmfWmFhJr15DyG92aO6sUSUd0d4H96X6Fchomr4n/ekM0HM6NRmysfIQ==

Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=dembMj2+1gE7zS+lYqAcJcNQefhMLdxXXXKDN88j8pSoN/vtR6DS3pDSrVVkN4azsuF7WDVGnkKTtkGkJGefHdhBtXLVLRARkeiw2hFT37+bsbcckhMFizrXbhO0/PombaZGSdJ0if3gC66wTxOtTdWnBR1lyXX9RchSV6++TqgShPoFbilIzFkC1vZh+MIRvwGHzKhqyPABBPL+mmkew7+AKD9WDV32dg75vvsxUFOWbG7fuPqQT7YxuBcHoE4S46fHbom3hMdHJVAmbNqw5bwxrO6a0ou6QYniC2bufTSJYEfRTs5LC9lgoB8K1nXGszqodXFmhKj0+x4Xh5SgVQ==

Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=citrix.com;

Cc: Jun Nakajima <jun.nakajima@xxxxxxxxx>, Kevin Tian <kevin.tian@xxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, George Dunlap <george.dunlap@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx

Delivery-date: Mon, 26 Sep 2022 14:28:04 +0000

Ironport-data: A9a23:qn7p16qEIsgjQ/sszxg3otznbuJeBmLbZBIvgKrLsJaIsI4StFCzt garIBmPM/iNY2Skc49+O423/RkBsJSHmtBhTQVk+CtmFn4XpJuZCYyVIHmrMnLJJKUvbq7GA +byyDXkBJppJpMJjk71atANlVEliefSAOKU5NfsYkhZXRVjRDoqlSVtkus4hp8AqdWiCkaGt MiaT/f3YTdJ4BYpdDNPg06/gEk35q6q6WlG5gVWic1j5zcyqVFEVPrzGonpR5fIatE8NvK3Q e/F0Ia48gvxl/v6Ior4+lpTWhRiro/6ZWBiuFIPM0SRqkEqShgJ+rQ6LJIhhXJ/0F1lqTzTJ OJl7vRcQS9xVkHFdX90vxNwS0mSNoUekFPLzOTWXWV+ACQqflO1q8iCAn3aMqVfo+grO1xO/ MUGB2wCVhmai+yEyaq0H7wEasQLdKEHPas5k1Q5l3T1KKhjRprOBaLX+dVfwTE8wNhUGurTb NYYbjwpawncZxpIOREcD5dWcOWA3yGjNWEH7g/E4/NouAA/zyQouFTpGMDSddGQA91cg26Tp 37c/nS/CRYfXDCa4Wreoi312L6S9c/9cKADSqGd9KRSuwyS6G1LAz8EW16GuObs3yZSXPoac ST44BEGr6I/6UiqRdnVRACjrTiPuRt0c9hNFas84QKEyKvR6i6YAHQJSnhKb9lOnMw7Wz0sk EOIltXBBDpzvbnTQnWYnp+LqRuiNC5TKnUNDQcUQA1A79T9rYUbihPUUs0lAKOzlsfyGzz73 3aNtidWulkIpcsC1qH+8VWZhTup/8LNVlRsuV6RWX+55ARkYoLjf5av9VXQ8fdHKsCeU0WFu 38H3cOZ6YjiEK2wqcBEe81VdJnB2hpPGGSD6bKzN/HNLwiQxkM=

Ironport-hdrordr: A9a23:cbMKsqzzAEVBzLMfVPBlKrPxvuskLtp133Aq2lEZdPULSKGlfp GV9sjziyWetN9wYh4dcB67Scy9qFfnhOZICO4qTMyftWjdyRKVxeRZgbcKrAeBJ8STzJ8/6U 4kSdkFNDSSNykEsS+Z2njeLz9I+rDunsGVbKXlvhFQpGlRGt1dBmxCe2Km+yNNNWt77c1TLu vg2iMLnUvoRZxRBf7LdUUtbqzmnZnmhZjmaRkJC1oO7xSPtyqh7PrfHwKD1hkTfjtTyfN6mF K12DDR1+GGibWW2xXc32jc49B/n8bg8MJKAIiphtIOIjvhpw60bMBKWqGEvhoyvOazgWxa2O XkklMFBYBe+nnRdma6rV/E3BTh6i8n7zvYxVqRkRLY0LjEbQN/L/AEqZNScxPf5UZllsp7yr h302WQsIcSJQ/cnQzmjuK4GC1Cpw6Rmz4PgOQTh3tQXc81c7lKt7ES+0tTDdMpAD/60oY6C+ NjZfusqMq+SWnqLkwxg1MfgOBFBh8Ib1S7qwk5y4GoOgFt7T5EJxBy/r1cop8CnKhNPqWsqd 60d5iAr4s+PvP+XZgNetvpfvHHe1AlYSi8R156cm6XYp0vCjbql6PdxokTyaWDRKEopaFC6q gpFmko/1IPRw==

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Mon, Sep 26, 2022 at 10:38:40AM +0200, Jan Beulich wrote: > On 23.09.2022 12:56, Roger Pau Monne wrote: > > The current way to detect whether a page handled to > > epte_get_entry_emt() is special and needs a forced write-back cache > > attribute involves iterating over all the smaller 4K pages for > > superpages. > > > > Such loop consumes a high amount of CPU time for 1GiB pages (order > > 18): on a Xeon® Silver 4216 (Cascade Lake) at 2GHz this takes an > > average amount of time of 1.5ms. Note that this figure just accounts > > for the is_special_page() loop, and not the whole code of > > epte_get_entry_emt(). Also the resolve_misconfig() operation that > > calls into epte_get_entry_emt() is done while holding the p2m lock in > > write (exclusive) mode, which blocks concurrent EPT_MISCONFIG faults > > and prevents most guest hypercalls for progressing due to the need to > > take the p2m lock in read mode to access any guest provided hypercall > > buffers. > > > > Simplify the checking in epte_get_entry_emt() and remove the loop, > > assuming that there won't be superpages being only partially special. > > > > So far we have no special superpages added to the guest p2m, > > We may not be adding them as superpages, but what a guest makes of > the pages it is given access to for e.g. grant handling, or what Dom0 > makes of e.g. the (per-CPU) trace buffers is unknown. And I guess > Dom0 ending up with a non-WB mapping of the trace buffers might > impact tracing quite a bit. I don't think we can build on guests not > making any such the subject of a large-range mapping attempt, which > might end up suitable for a superpage mapping (recall that rather > sooner than later we ought to finally re-combine suitable ranges of > contiguous 4k mappings into 2M ones, just like we [now] do in IOMMU > code). Hm, doesn't pages used for grant handling (XENMAPSPACE_grant_table) cause them to be mapped as 4K entries in the p2m page tables. The code in xenmem_add_to_physmap_one() seems to remove and re-add them with order 0. Same with the trace buffers, they are added as order 0 to the p2m. Note that when coalescing we would need to be careful then to not coalesce special pages. Might not be the best model because I'm not sure why we require XENMAPSPACE_grant_table to force entries to not be mapped as part of a super page in the guest p2m. > Since for data structures like the ones named above 2M mappings > might be enough (i.e. there might be little "risk" of even needing to > go to 1G ones), could we maybe take a "middle" approach and check all > pages when order == 9, but use your approach for higher orders? The > to-be-added re-coalescing would then need to by taught to refuse re- > coalescing of such ranges to larger than 2M mappings, while still > at least allowing for 2M ones. (Special casing at that boundary is > going to be necessary also for shadow code, at the very least.) But > see also below as to caveats. I guess a rangeset would be more future proof than anything else. > > and in > > any case the forcing of the write-back cache attribute is a courtesy > > to the guest to avoid such ranges being accessed as uncached when not > > really needed. It's not acceptable for such assistance to tax the > > system so badly. > > I agree we would better improve the situation, but I don't think we > can do so by ... > > > @@ -518,26 +517,19 @@ int epte_get_entry_emt(struct domain *d, gfn_t gfn, > > mfn_t mfn, > > return MTRR_TYPE_UNCACHABLE; > > } > > > > - if ( type != p2m_mmio_direct && !is_iommu_enabled(d) && > > - !cache_flush_permitted(d) ) > > + if ( (type != p2m_mmio_direct && !is_iommu_enabled(d) && > > + !cache_flush_permitted(d)) || > > + /* > > + * Assume the whole page to be special if the first 4K chunk is: > > + * iterating over all possible 4K sub-pages for higher order > > pages is > > + * too expensive. > > + */ > > + is_special_page(mfn_to_page(mfn)) ) > > ... building in assumptions like this one. The more that here you may > also produce too weak a memory type (think of a later page in the range > requiring a stronger-ordered memory type). > > While it may not help much, ... > > > { > > *ipat = true; > > return MTRR_TYPE_WRBACK; > > } > > > > - for ( special_pgs = i = 0; i < (1ul << order); i++ ) > > - if ( is_special_page(mfn_to_page(mfn_add(mfn, i))) ) > > - special_pgs++; > > - > > - if ( special_pgs ) > > - { > > - if ( special_pgs != (1ul << order) ) > > - return -1; > > - > > - *ipat = true; > > - return MTRR_TYPE_WRBACK; > > - } > > ... this logic could be improved to at least bail from the loop once it's > clear that the "-1" return path will be taken. Improvements beyond that > would likely involve adding some data structure (rangeset?) to track > special pages. For the guest I was running the loop didn't find any special pages in order 18 mappings, which are the most troublesome to handle in the loop. I'm not sure bailing early would make that much of a difference in practice TBH. I did also consider using a rangeset, but that would use more per-domain memory and also require coordination to add special pages to it. Thanks, Roger.

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.