|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v2 4/4] xen/mm: Recall claims when offlining pages if needed
On 20.04.2026 15:19, Bernhard Kaindl wrote:
> Fix a bug where offlining pages could cause an unsigned underflow
> in total_avail_pages - outstanding_claims, leading to incorrect
> claim behavior.
>
> This issue arises when outstanding claims are close to the total
> available pages. It occurse when domain_set_outstanding_claims()
> and domain_install_claim_set effectively do this:
>
> unsigned long avail_pages = total_avail_pages - outstanding_claims;
>
> When this unsigned subtraction underflows, staking claims can succeed
> even when there is insufficient unclaimed memory for the new claim.
> This leads to a state where claims always succeed, regardless of
> actual memory availability.
>
> To prevent this, recall claims when offlining pages if needed to maintain
> equilibrium between `total_avail_pages` and outstanding claims for global
> and for per-NUMA-node claims.
>
> Signed-off-by: Bernhard Kaindl <bernhard.kaindl@xxxxxxxxxx>
First: This patch being a bug fix (which, btw, lacks a Fixes: tag), it wants
to move ahead of the NUMA claims series, to facilitate backporting.
> --- a/xen/common/page_alloc.c
> +++ b/xen/common/page_alloc.c
> @@ -1575,6 +1575,48 @@ static int reserve_offlined_page(struct page_info
> *head)
> count++;
> }
>
> + if ( count )
Why did you decide to put this in reserve_offlined_page()? The function has
two callers, yet in one case the extra checking is entirely unneeded afaict.
While in the other case we further know that it's exactly one page which is
being offlined.
> + {
> + long recall_pages;
> + struct domain *d;
> +
> + /* Ensure that claims on the node are in line with its free memory.
> */
> + recall_pages = node_outstanding_claims[node] -
> node_avail_pages[node];
> + if ( recall_pages > 0 )
> + /*
> + * node_avail_pages slipped below node_outstanding_claims.
> + * We need to recall claimed pages until the amount of claimed
> + * memory is in line with the amount of available memory again.
> + */
> + for_each_domain ( d )
Such loops need serializing against domain list modifications. Also we really
don't want to loop over all domains twice.
> + {
> + if ( d->claims[node] )
> + {
> + recall_pages -= deduct_node_claims(d, node,
> recall_pages);
> + if ( recall_pages <= 0 )
> + break;
> + }
> + }
So domains early on the list are penalized over ones later on the list? The
only truly fair approach I can think of right away looks to be to discard all
claims, requiring the toolstack to re-establish them. With the observation
above the next best approach might be to cycle through domains, removing one
page from their claim and recording where we left off (the per-node part
would be slightly more involved). However, any reduction of a claim is likely
to render that claim useless altogether. Hence requiring admin action (to
re-establish claims) may still be the least bad option.
Jan
> + /* Ensure that outstanding claims are in line with available memory.
> */
> + recall_pages = outstanding_claims - total_avail_pages;
> + if ( recall_pages > 0 )
> + /*
> + * total_avail_pages slipped below outstanding_claims.
> + * We need to recall claimed pages until the amount of claimed
> + * memory is in line with the amount of available memory again.
> + */
> + for_each_domain ( d )
> + {
> + if ( d->global_claims )
> + {
> + recall_pages -= deduct_global_claims(d, recall_pages);
> + if ( recall_pages <= 0 )
> + break;
> + }
> + }
> + }
> +
> return count;
> }
>
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |