[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH] Optimise restore memory allocation
On 27/08/2025 6:23 pm, Andrew Cooper wrote: > Subject wants to be at least tools/libxc, and probably "Use superpages > where possible on migrate/resume" > > > On 27/08/2025 1:33 pm, Frediano Ziglio wrote: >> Try to allocate larger order pages. >> With some test memory program stressing TLB (many small random >> memory accesses) you can get 15% performance improves. >> On the first memory iteration the sender is currently sending >> memory in 4mb aligned chunks which allows the receiver to >> allocate most pages as 2mb superpages instead of single 4kb pages. > It's critical to say somewhere that this is applicable to HVM guests. > > You've eluded to it, but it's important to state clearly that, for HVM > guests, PAGE_DATA records contain metadata about GFNs in aligned 4M > blocks. This is why we don't even need to buffer a second record. > > It's also worth stating that 1G superpages are left for later. > > > CC-ing Oleksii as release manager. This is a fix for a (mis)feature > which has been known for a decade (since Xen 4.6), and for which two > series have been posted but not managed to get in. Unlike those series, > this is a very surgical fix that gets the majority of the perf win back, > without the complexity of trying to guess at 1G pages. > > Therefore I'd like to request that it be considered for 4.21 at this > juncture. > >> Signed-off-by: Frediano Ziglio <frediano.ziglio@xxxxxxxxx> >> --- >> tools/libs/guest/xg_sr_restore.c | 39 ++++++++++++++++++++++++++++---- >> 1 file changed, 35 insertions(+), 4 deletions(-) >> >> diff --git a/tools/libs/guest/xg_sr_restore.c >> b/tools/libs/guest/xg_sr_restore.c >> index 06231ca826..8dcb1b19c5 100644 >> --- a/tools/libs/guest/xg_sr_restore.c >> +++ b/tools/libs/guest/xg_sr_restore.c >> @@ -129,6 +129,8 @@ static int pfn_set_populated(struct xc_sr_context *ctx, >> xen_pfn_t pfn) >> return 0; >> } >> >> +#define IS_POWER_OF_2(n) (((n) & ((n) - 1)) == 0) >> + >> /* >> * Given a set of pfns, obtain memory from Xen to fill the physmap for the >> * unpopulated subset. If types is NULL, no page type checking is performed >> @@ -141,6 +143,7 @@ int populate_pfns(struct xc_sr_context *ctx, unsigned >> int count, >> xen_pfn_t *mfns = malloc(count * sizeof(*mfns)), >> *pfns = malloc(count * sizeof(*pfns)); >> unsigned int i, nr_pfns = 0; >> + bool contiguous = true; >> int rc = -1; >> >> if ( !mfns || !pfns ) >> @@ -159,18 +162,46 @@ int populate_pfns(struct xc_sr_context *ctx, unsigned >> int count, >> if ( rc ) >> goto err; >> pfns[nr_pfns] = mfns[nr_pfns] = original_pfns[i]; >> + if ( pfns[nr_pfns] != pfns[0] + nr_pfns ) >> + contiguous = false; >> ++nr_pfns; >> } >> } >> >> if ( nr_pfns ) >> { >> - rc = xc_domain_populate_physmap_exact( >> - xch, ctx->domid, nr_pfns, 0, 0, mfns); >> + /* try optimizing using larger order */ >> + rc = -1; >> + /* >> + * The "nr_pfns <= (1 << 18)" check is mainly for paranoia, it >> should >> + * never happen, the sender would have to send a really large >> packet. >> + */ >> + if ( contiguous && nr_pfns <= (1 << 18) && > This is an arbitrary limit, and to contradict the prior feedback given > in private, the domain's MAX_ORDER isn't relevant here. It's the > toolstack's choice how to lay out the guest in memory. > >> + IS_POWER_OF_2(nr_pfns) && (pfns[0] & (nr_pfns - 1)) == 0 ) >> + { >> + const unsigned int extent_order = __builtin_ffs(nr_pfns) - 1; > This (non-)loop isn't great. You'll e.g. use 4k pages for the second 2M > page of an HVM guest simply because the VGA hole exists in the first. > > I think you probably want something like: > > int populate_physmap_4k(ctx, nr, mfns); > int populate_physmap_2M(ctx, nr, mfns); > > as simple wrappers around the raw hypercalls including transforming back > the mfns[] array, and: > > int populate_phymap(...); > > with logic of the form: > > while ( nr ) > { > if ( nr < 512 ) /* Can't be a superpage */ > { > populate_physmap_4k(ctx, i, mfns); > mfns += i; > nr -= i; > continue; > } > > if ( !ALIGNED_2M(mfn) ) /* Populate up until a point that could be a > superpage */ > { > while ( !ALIGNED_2M(mfn + i) ) > i++; > populate_physmap_4k(ctx, i, mfns); > mfns += i; > nr -= i; > } > > if ( nr >= 512 ) > { > for ( i = 1; i < 512; ++i ) > if ( mfns[i] != mfns[0] + i ) > break; > if ( i == 512 ) > populate_physmap_2M(ctx, i, mfns); > else > populate_physmap_4k(...); > > mfns += i; > nr -= i; > } > } > > > > Obviously with error handling, and whatever boundary conditions I've got > wrong. > > 2M is the only size that matters (ignoring 1G which we've excluded for > now), and this form will reconstruct more superpages for the guest than > trying to do 4M allocations will. Actually, I think a simpler option is to alter the first loop, where you've got contiguous= right now. Something like: @@ -158,6 +165,18 @@ int populate_pfns(struct xc_sr_context *ctx, unsigned int count, rc = pfn_set_populated(ctx, original_pfns[i]); if ( rc ) goto err; + + if ( ALIGNED_2M(original_pfns[i]) && + can_populate_2M(&original_pfns[i], count - i) ) + { + xen_pfn_t mfn = original_pfns[i]; + + rc = xc_domain_populate_physmap_exact(xch, ctx->domid, 1, ORDER_2M, 0, &mfn); + + ... + + i += 511; + continue; + } + pfns[nr_pfns] = mfns[nr_pfns] = original_pfns[i]; ++nr_pfns; } with error handling of course. This way we don't potentially fall back to many separate hypercalls for 4k pages. ~Andrew
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |