|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v4 03/10] xen/page_alloc: Implement NUMA-node-specific claims
On 05.03.2026 14:12, Bernhard Kaindl wrote:
> Jan Beulich wrote:
>>> + d->claim_node = NUMA_NO_NODE;
>>
>> If, as the cover letter says, the new domctl is going to allow claiming from
>> multiple nodes in one go, why would this new field still be necessary?
>
> Roger requested the domctl API to allow claiming from multiple nodes in one go
> and he specified that we should focus on getting the implementation for one
> node-specific claim done first before we dive into multi-node claims code.
>
> - Instead of adding/linking an array of claims to struct domain, we can keep
> using d->outstanding_pages for the single-node claim.
>
> - There are numerous comments and questions for this minimal implementation.
> If we'd add multi-node claims to it, this review may become even more
> complex.
>
> - The single-node claims backend contains the infrastructure and multi-node
> claims would be an extension on top of that infrastructure.
What is at the very least needed is an outline of how multi-node claims are
intended to work. This is because what you do here needs to fit that scheme.
Which in turn I think is going to be difficult when for a domain more memory
is needed than any single node can supply. Hence why I think that you may
not be able to get away with just single-node claims, no matter that this
of course complicates things.
It's also not quite clear to me how multiple successive claims against
distinct nodes would work (which isn't all that different from a multi-node
claim).
Thinking of it, interaction with the existing mem-op also wants clarifying.
Imo only one of the two ought to be usable on a single domain.
>>> +static unsigned long node_outstanding_claims[MAX_NUMNODES];
>>
>> How come this is being added, rather than it replacing outstanding_claims?
>
> The global outstanding_claims variable counts the host-level claimed pages.
>
> It has the sum of all host-level claims that are not specific to a NUMA node
> and also the sum of all node-specific claims (see more on that in an answer
> to another question further below).
>
> If we were to replace it, we'd not have the outstanding_claims counter,
> which would result in not supporting global claims anymore in Xen.
>
> If a toolstack would want to claim more memory than a single NUMA node
> has, someone would have to go loop over all NUMA nodes with enough memory
> and split the claim across a number or per-NUMA-node claims.
>
> - This would be less flexible than what we have with it: With it, we can
> still support host-level claims without those claims be placed on
> specific NUMA nodes.
>
> - This allows for many domains to be built in parallel where some
> domains are built with claims specific to specific NUMA nodes, while
> allowing the other domains are built dynamically at allocation time
> from any remaining memory wherever that memory remains to be available.
>
> - There are use cases where the memory of some domains shall be spread
> across NUMA nodes to have the memory bandwidth of those NUMA nodes
> available to the individual processes in the guest domain but still
> want the assurance of host-level claims for claimed memory when
> constructing of many domains in parallel on a host.
All fine, but why is this written down only in a reply to review comments,
rather than right in the patch description?
>>> +/* Return available pages after subtracting claimed pages */
>>> +static inline unsigned long available_after_claims(unsigned long
>> avail_pages,
>>> + unsigned long claims)
>>> +{
>>> + BUG_ON(claims > avail_pages);
>>> + return avail_pages - claims; /* Due to the BUG_ON, it cannot be
>>> negative
>> */
>>> +}
>>
>> A helper for a simple subtraction?
>
> It is about not having to repeat the BUG_ON(claims > avail_pages) everywhere.
Which in turn I should have said I question. Imo this is supposed to be an
ASSERT(), not a BUG_ON().
> Also, the name of the helper makes clear what the result of the expression is,
> So when using it, the flow of the code is more natural to understand.
>
> Having to repeat the BUG_ON() everywhere would make the code less readable.
> The BUG_ON() is good when refactoring as a guardrail when you broke the code.
I'm not quite sure there.
>>> +/* Answer if host-level memory and claims permit this request to proceed */
>>> +static inline bool host_allocatable_request(const struct domain *d,
>>> + unsigned int memflags,
>>> + unsigned long request)
>>> +{
>>> + unsigned long allocatable_pages;
>>> +
>>> + ASSERT(spin_is_locked(&heap_lock));
>>> +
>>> + allocatable_pages = available_after_claims(total_avail_pages,
>>> + outstanding_claims);
>>> + if ( allocatable_pages >= request )
>>> + return true; /* The not claimed pages are enough to proceed */
>>> +
>>> + if ( !d || (memflags & MEMF_no_refcount) )
>>> + return false; /* Claims are not available for this allocation */
>>> +
>>> + /* The domain's claims are available, return true if sufficient */
>>> + return request <= allocatable_pages + d->outstanding_pages;
>>> +}
>>
>> This only uses variables which existed before, i.e. there's nothing NUMA-ish
>> in here. What's the deal?
>
> The deal is that for taking unclaimed memory beyond the remaining claims
> Into account for deciding that the host has usable memory for a domain with
> a claim, the needed if-expression would be quite complicated to understand.
> When factoring this logic into an if expression without extracting it into
> a function, it would bloat flow alloc_heap_pages(), especially if one would
> want to have the comments. I'm not sure if this is a good idea.
I guess I don't really follow: Right here all you do is transform a complex
if() into one that calls this function, with no functional difference. This
function isn't changed by subsequent patches. Hence what's the concern?
That said, I don't mind breaking it out, but as said - as a separate change,
and then with its NUMA counterpart preferably folded in.
Jan
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |