|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH] Optimise restore memory allocation
On 27/08/2025 6:23 pm, Andrew Cooper wrote:
> Subject wants to be at least tools/libxc, and probably "Use superpages
> where possible on migrate/resume"
>
>
> On 27/08/2025 1:33 pm, Frediano Ziglio wrote:
>> Try to allocate larger order pages.
>> With some test memory program stressing TLB (many small random
>> memory accesses) you can get 15% performance improves.
>> On the first memory iteration the sender is currently sending
>> memory in 4mb aligned chunks which allows the receiver to
>> allocate most pages as 2mb superpages instead of single 4kb pages.
> It's critical to say somewhere that this is applicable to HVM guests.
>
> You've eluded to it, but it's important to state clearly that, for HVM
> guests, PAGE_DATA records contain metadata about GFNs in aligned 4M
> blocks. This is why we don't even need to buffer a second record.
>
> It's also worth stating that 1G superpages are left for later.
>
>
> CC-ing Oleksii as release manager. This is a fix for a (mis)feature
> which has been known for a decade (since Xen 4.6), and for which two
> series have been posted but not managed to get in. Unlike those series,
> this is a very surgical fix that gets the majority of the perf win back,
> without the complexity of trying to guess at 1G pages.
>
> Therefore I'd like to request that it be considered for 4.21 at this
> juncture.
>
>> Signed-off-by: Frediano Ziglio <frediano.ziglio@xxxxxxxxx>
>> ---
>> tools/libs/guest/xg_sr_restore.c | 39 ++++++++++++++++++++++++++++----
>> 1 file changed, 35 insertions(+), 4 deletions(-)
>>
>> diff --git a/tools/libs/guest/xg_sr_restore.c
>> b/tools/libs/guest/xg_sr_restore.c
>> index 06231ca826..8dcb1b19c5 100644
>> --- a/tools/libs/guest/xg_sr_restore.c
>> +++ b/tools/libs/guest/xg_sr_restore.c
>> @@ -129,6 +129,8 @@ static int pfn_set_populated(struct xc_sr_context *ctx,
>> xen_pfn_t pfn)
>> return 0;
>> }
>>
>> +#define IS_POWER_OF_2(n) (((n) & ((n) - 1)) == 0)
>> +
>> /*
>> * Given a set of pfns, obtain memory from Xen to fill the physmap for the
>> * unpopulated subset. If types is NULL, no page type checking is performed
>> @@ -141,6 +143,7 @@ int populate_pfns(struct xc_sr_context *ctx, unsigned
>> int count,
>> xen_pfn_t *mfns = malloc(count * sizeof(*mfns)),
>> *pfns = malloc(count * sizeof(*pfns));
>> unsigned int i, nr_pfns = 0;
>> + bool contiguous = true;
>> int rc = -1;
>>
>> if ( !mfns || !pfns )
>> @@ -159,18 +162,46 @@ int populate_pfns(struct xc_sr_context *ctx, unsigned
>> int count,
>> if ( rc )
>> goto err;
>> pfns[nr_pfns] = mfns[nr_pfns] = original_pfns[i];
>> + if ( pfns[nr_pfns] != pfns[0] + nr_pfns )
>> + contiguous = false;
>> ++nr_pfns;
>> }
>> }
>>
>> if ( nr_pfns )
>> {
>> - rc = xc_domain_populate_physmap_exact(
>> - xch, ctx->domid, nr_pfns, 0, 0, mfns);
>> + /* try optimizing using larger order */
>> + rc = -1;
>> + /*
>> + * The "nr_pfns <= (1 << 18)" check is mainly for paranoia, it
>> should
>> + * never happen, the sender would have to send a really large
>> packet.
>> + */
>> + if ( contiguous && nr_pfns <= (1 << 18) &&
> This is an arbitrary limit, and to contradict the prior feedback given
> in private, the domain's MAX_ORDER isn't relevant here. It's the
> toolstack's choice how to lay out the guest in memory.
>
>> + IS_POWER_OF_2(nr_pfns) && (pfns[0] & (nr_pfns - 1)) == 0 )
>> + {
>> + const unsigned int extent_order = __builtin_ffs(nr_pfns) - 1;
> This (non-)loop isn't great. You'll e.g. use 4k pages for the second 2M
> page of an HVM guest simply because the VGA hole exists in the first.
>
> I think you probably want something like:
>
> int populate_physmap_4k(ctx, nr, mfns);
> int populate_physmap_2M(ctx, nr, mfns);
>
> as simple wrappers around the raw hypercalls including transforming back
> the mfns[] array, and:
>
> int populate_phymap(...);
>
> with logic of the form:
>
> while ( nr )
> {
> if ( nr < 512 ) /* Can't be a superpage */
> {
> populate_physmap_4k(ctx, i, mfns);
> mfns += i;
> nr -= i;
> continue;
> }
>
> if ( !ALIGNED_2M(mfn) ) /* Populate up until a point that could be a
> superpage */
> {
> while ( !ALIGNED_2M(mfn + i) )
> i++;
> populate_physmap_4k(ctx, i, mfns);
> mfns += i;
> nr -= i;
> }
>
> if ( nr >= 512 )
> {
> for ( i = 1; i < 512; ++i )
> if ( mfns[i] != mfns[0] + i )
> break;
> if ( i == 512 )
> populate_physmap_2M(ctx, i, mfns);
> else
> populate_physmap_4k(...);
>
> mfns += i;
> nr -= i;
> }
> }
>
>
>
> Obviously with error handling, and whatever boundary conditions I've got
> wrong.
>
> 2M is the only size that matters (ignoring 1G which we've excluded for
> now), and this form will reconstruct more superpages for the guest than
> trying to do 4M allocations will.
Actually, I think a simpler option is to alter the first loop, where
you've got contiguous= right now.
Something like:
@@ -158,6 +165,18 @@ int populate_pfns(struct xc_sr_context *ctx, unsigned int
count,
rc = pfn_set_populated(ctx, original_pfns[i]);
if ( rc )
goto err;
+
+ if ( ALIGNED_2M(original_pfns[i]) &&
+ can_populate_2M(&original_pfns[i], count - i) )
+ {
+ xen_pfn_t mfn = original_pfns[i];
+
+ rc = xc_domain_populate_physmap_exact(xch, ctx->domid, 1,
ORDER_2M, 0, &mfn);
+
+ ...
+
+ i += 511;
+ continue;
+ }
+
pfns[nr_pfns] = mfns[nr_pfns] = original_pfns[i];
++nr_pfns;
}
with error handling of course. This way we don't potentially fall back
to many separate hypercalls for 4k pages.
~Andrew
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |