[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v4 03/10] xen/page_alloc: Implement NUMA-node-specific claims


  • To: Bernhard Kaindl <bernhard.kaindl@xxxxxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Thu, 5 Mar 2026 14:36:10 +0100
  • Autocrypt: addr=jbeulich@xxxxxxxx; keydata= xsDiBFk3nEQRBADAEaSw6zC/EJkiwGPXbWtPxl2xCdSoeepS07jW8UgcHNurfHvUzogEq5xk hu507c3BarVjyWCJOylMNR98Yd8VqD9UfmX0Hb8/BrA+Hl6/DB/eqGptrf4BSRwcZQM32aZK 7Pj2XbGWIUrZrd70x1eAP9QE3P79Y2oLrsCgbZJfEwCgvz9JjGmQqQkRiTVzlZVCJYcyGGsD /0tbFCzD2h20ahe8rC1gbb3K3qk+LpBtvjBu1RY9drYk0NymiGbJWZgab6t1jM7sk2vuf0Py O9Hf9XBmK0uE9IgMaiCpc32XV9oASz6UJebwkX+zF2jG5I1BfnO9g7KlotcA/v5ClMjgo6Gl MDY4HxoSRu3i1cqqSDtVlt+AOVBJBACrZcnHAUSuCXBPy0jOlBhxPqRWv6ND4c9PH1xjQ3NP nxJuMBS8rnNg22uyfAgmBKNLpLgAGVRMZGaGoJObGf72s6TeIqKJo/LtggAS9qAUiuKVnygo 3wjfkS9A3DRO+SpU7JqWdsveeIQyeyEJ/8PTowmSQLakF+3fote9ybzd880fSmFuIEJldWxp Y2ggPGpiZXVsaWNoQHN1c2UuY29tPsJgBBMRAgAgBQJZN5xEAhsDBgsJCAcDAgQVAggDBBYC AwECHgECF4AACgkQoDSui/t3IH4J+wCfQ5jHdEjCRHj23O/5ttg9r9OIruwAn3103WUITZee e7Sbg12UgcQ5lv7SzsFNBFk3nEQQCACCuTjCjFOUdi5Nm244F+78kLghRcin/awv+IrTcIWF hUpSs1Y91iQQ7KItirz5uwCPlwejSJDQJLIS+QtJHaXDXeV6NI0Uef1hP20+y8qydDiVkv6l IreXjTb7DvksRgJNvCkWtYnlS3mYvQ9NzS9PhyALWbXnH6sIJd2O9lKS1Mrfq+y0IXCP10eS FFGg+Av3IQeFatkJAyju0PPthyTqxSI4lZYuJVPknzgaeuJv/2NccrPvmeDg6Coe7ZIeQ8Yj t0ARxu2xytAkkLCel1Lz1WLmwLstV30g80nkgZf/wr+/BXJW/oIvRlonUkxv+IbBM3dX2OV8 AmRv1ySWPTP7AAMFB/9PQK/VtlNUJvg8GXj9ootzrteGfVZVVT4XBJkfwBcpC/XcPzldjv+3 HYudvpdNK3lLujXeA5fLOH+Z/G9WBc5pFVSMocI71I8bT8lIAzreg0WvkWg5V2WZsUMlnDL9 mpwIGFhlbM3gfDMs7MPMu8YQRFVdUvtSpaAs8OFfGQ0ia3LGZcjA6Ik2+xcqscEJzNH+qh8V m5jjp28yZgaqTaRbg3M/+MTbMpicpZuqF4rnB0AQD12/3BNWDR6bmh+EkYSMcEIpQmBM51qM EKYTQGybRCjpnKHGOxG0rfFY1085mBDZCH5Kx0cl0HVJuQKC+dV2ZY5AqjcKwAxpE75MLFkr wkkEGBECAAkFAlk3nEQCGwwACgkQoDSui/t3IH7nnwCfcJWUDUFKdCsBH/E5d+0ZnMQi+G0A nAuWpQkjM1ASeQwSHEeAWPgskBQL
  • Cc: Andrew Cooper <andrew.cooper@xxxxxxxxxx>, Anthony PERARD <anthony.perard@xxxxxxxxxx>, Michal Orzel <michal.orzel@xxxxxxx>, Julien Grall <julien@xxxxxxx>, Roger Pau Monne <roger.pau@xxxxxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Marcus Granado <Marcus.Granado@xxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Alejandro Vallejo <Alejandro.GarciaVallejo@xxxxxxx>
  • Delivery-date: Thu, 05 Mar 2026 13:36:27 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 05.03.2026 14:12, Bernhard Kaindl wrote:
> Jan Beulich wrote:
>>> +    d->claim_node = NUMA_NO_NODE;
>>
>> If, as the cover letter says, the new domctl is going to allow claiming from
>> multiple nodes in one go, why would this new field still be necessary?
> 
> Roger requested the domctl API to allow claiming from multiple nodes in one go
> and he specified that we should focus on getting the implementation for one
> node-specific claim done first before we dive into multi-node claims code.
> 
> - Instead of adding/linking an array of claims to struct domain, we can keep
>   using d->outstanding_pages for the single-node claim.
> 
> - There are numerous comments and questions for this minimal implementation.
>   If we'd add multi-node claims to it, this review may become even more 
> complex.
> 
> - The single-node claims backend contains the infrastructure and multi-node
>   claims would be an extension on top of that infrastructure.

What is at the very least needed is an outline of how multi-node claims are
intended to work. This is because what you do here needs to fit that scheme.
Which in turn I think is going to be difficult when for a domain more memory
is needed than any single node can supply. Hence why I think that you may
not be able to get away with just single-node claims, no matter that this
of course complicates things.

It's also not quite clear to me how multiple successive claims against
distinct nodes would work (which isn't all that different from a multi-node
claim).

Thinking of it, interaction with the existing mem-op also wants clarifying.
Imo only one of the two ought to be usable on a single domain.

>>> +static unsigned long node_outstanding_claims[MAX_NUMNODES];
>>
>> How come this is being added, rather than it replacing outstanding_claims?
> 
> The global outstanding_claims variable counts the host-level claimed pages.
> 
> It has the sum of all host-level claims that are not specific to a NUMA node
> and also the sum of all node-specific claims (see more on that in an answer
> to another question further below).
> 
> If we were to replace it, we'd not have the outstanding_claims counter,
> which would result in not supporting global claims anymore in Xen.
> 
> If a toolstack would want to claim more memory than a single NUMA node
> has, someone would have to go loop over all NUMA nodes with enough memory
> and split the claim across a number or per-NUMA-node claims.
> 
> - This would be less flexible than what we have with it: With it, we can
>   still support host-level claims without those claims be placed on
>   specific NUMA nodes.
> 
> - This allows for many domains to be built in parallel where some
>   domains are built with claims specific to specific NUMA nodes, while
>   allowing the other domains are built dynamically at allocation time
>   from any remaining memory wherever that memory remains to be available.
>   
> - There are use cases where the memory of some domains shall be spread
>   across NUMA nodes to have the memory bandwidth of those NUMA nodes
>   available to the individual processes in the guest domain but still
>   want the assurance of host-level claims for claimed memory when
>   constructing of many domains in parallel on a host.

All fine, but why is this written down only in a reply to review comments,
rather than right in the patch description?

>>> +/* Return available pages after subtracting claimed pages */
>>> +static inline unsigned long available_after_claims(unsigned long
>> avail_pages,
>>> +                                                   unsigned long claims)
>>> +{
>>> +    BUG_ON(claims > avail_pages);
>>> +    return avail_pages - claims; /* Due to the BUG_ON, it cannot be 
>>> negative
>> */
>>> +}
>>
>> A helper for a simple subtraction?
> 
> It is about not having to repeat the BUG_ON(claims > avail_pages) everywhere.

Which in turn I should have said I question. Imo this is supposed to be an
ASSERT(), not a BUG_ON().

> Also, the name of the helper makes clear what the result of the expression is,
> So when using it, the flow of the code is more natural to understand.
> 
> Having to repeat the BUG_ON() everywhere would make the code less readable.
> The BUG_ON() is good when refactoring as a guardrail when you broke the code.

I'm not quite sure there.

>>> +/* Answer if host-level memory and claims permit this request to proceed */
>>> +static inline bool host_allocatable_request(const struct domain *d,
>>> +                                            unsigned int memflags,
>>> +                                            unsigned long request)
>>> +{
>>> +    unsigned long allocatable_pages;
>>> +
>>> +    ASSERT(spin_is_locked(&heap_lock));
>>> +
>>> +    allocatable_pages = available_after_claims(total_avail_pages,
>>> +                                               outstanding_claims);
>>> +    if ( allocatable_pages >= request )
>>> +        return true; /* The not claimed pages are enough to proceed */
>>> +
>>> +    if ( !d || (memflags & MEMF_no_refcount) )
>>> +        return false; /* Claims are not available for this allocation */
>>> +
>>> +    /* The domain's claims are available, return true if sufficient */
>>> +    return request <= allocatable_pages + d->outstanding_pages;
>>> +}
>>
>> This only uses variables which existed before, i.e. there's nothing NUMA-ish
>> in here. What's the deal?
> 
> The deal is that for taking unclaimed memory beyond the remaining claims
> Into account for deciding that the host has usable memory for a domain with
> a claim, the needed if-expression would be quite complicated to understand.
> When factoring this logic into an if expression without extracting it into
> a function, it would bloat flow alloc_heap_pages(), especially if one would
> want to have the comments. I'm not sure if this is a good idea.

I guess I don't really follow: Right here all you do is transform a complex
if() into one that calls this function, with no functional difference. This
function isn't changed by subsequent patches. Hence what's the concern?

That said, I don't mind breaking it out, but as said - as a separate change,
and then with its NUMA counterpart preferably folded in.

Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.