|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] RE: [PATCH v4 03/10] xen/page_alloc: Implement NUMA-node-specific claims
Jan Beulich wrote:
> On 05.03.2026 14:12, Bernhard Kaindl wrote:
> >
> > Roger requested the domctl API to allow claiming from multiple nodes in one
> > go
> > and he specified that we should focus on getting the implementation for one
> > node-specific claim done first before we dive into multi-node claims code.
> >
> > - Instead of adding/linking an array of claims to struct domain, we can keep
> > using d->outstanding_pages for the single-node claim.
> >
> > - There are numerous comments and questions for this minimal implementation.
> > If we'd add multi-node claims to it, this review may become even more
> > complex.
> >
> > - The single-node claims backend contains the infrastructure and multi-node
> > claims would be an extension on top of that infrastructure.
>
> What is at the very least needed is an outline of how multi-node claims are
> intended to work. This is because what you do here needs to fit that scheme.
> Which in turn I think is going to be difficult when for a domain more memory
> is needed than any single node can supply. Hence why I think that you may
> not be able to get away with just single-node claims, no matter that this
> of course complicates things.
>
> It's also not quite clear to me how multiple successive claims against
> distinct nodes would work (which isn't all that different from a multi-node
> claim).
>
> Thinking of it, interaction with the existing mem-op also wants clarifying.
> Imo only one of the two ought to be usable on a single domain.
Yes, correct. As implemented by Xen in domain_set_outstanding_claims(),
Xen claims work very different from something like an allocation:
For example, when you allocate, you get memory, and when you repeat,
you have a bigger allocation.
But Xen claims in domain_set_outstanding_claims() don't work like that:
- When a domain has a claim, domain_set_outstanding_claims() only allows
to reset the claim to 0, nothing else. A second, or changed claim is not
possible. I think this was intentional:
- domain_set_outstanding_claims() rejects increasing/reducing a claim:
A claim is designed to be made by domain build when the size of the
domain is known. There is no tweaking it afterwards: The needed pages
shall be claimed by the domain builder before the domain is built.
Note: The claims are not only consumed when populating guest memory:
Claims are also (at least attempted to be) consumed when Xen needs to
allocate memory for other resources of the domain. For this reason,
the domain builder needs to add some headroom for allocations done by
Xen for creating the domain.
When the domain builder has finished building the domain, it is expected
to reset the claim to release any not consumed headroom it added.
- If a domain already has memory when the domain builder stakes a claim
for completing the build of the domain, the outstanding_claims are set
to the target value of the claim call, minus domain_tot_pages(d), so
already allocated memory does not contribute to a bigger total booking.
For NUMA claims and global host-level claims, it is similar:
A NUMA node-specific claim is implicitly also added to the global
host-level outstanding_claims of the host, as a Node-specific memory
is also part of the host's memory, so the host-level claims protection
does not have to also check for node-specific claims:
The effect of host-level claim is also given when you make a node-level claim.
When a domain one kind of claim, it does not make a lot of sense to then
later add a differently sized claim for another target. Like described in
how domain_set_outstanding_claims() is implemented, a domain builder stakes
a claim once, then builds the domain, then resets it, and that's all to it.
For example, Xapi toolstack and libxenguest have calls to claim memory,
but in any given configuration, only the first actor to claim memory for
a domain is the one who defines the claim: No mixing, changing, updating.
It makes things clear that the initial creator did make the claim.
Similar for multi-node claims:
Roger described how he wants this API do work here:
https://lists.xenproject.org/archives/html/xen-devel/2025-06/msg00484.html
(Before, he said that with multiple calls, it would be awkward, with partial
claims and rollback, and I want to add that would be diametrically counter
the original claims design of not allowing multiple calls)
> Ideally, we would need to introduce a new hypercall that allows making
> claims from multiple nodes in a single locked region, as to ensure
> success or failure in an atomic way.
In the locked region (inside heap_lock), we can check the claims requests
against existing claims and memory of the affected nodes and determine if
the claim call is a go or a no-go. If it is a go, we update all counters
which are all protected by the heap_lock and are done.
There is no partial success or failure. It will be atomic, like Roger asked.
With this, as I understand think I should create a design specification
for how claims are designed in Xen and how the claims design can be
extended to support atomic multi-node claims (without rollbacks/concurrency
issues).
I started describing how Xen implements claims in /docs/hypervisor-guide here:
https://bernhardk-xen-review.readthedocs.io/node-claims/hypervisor-guide/mm/claims.html
I'd add these new clarifications to this description then, I think.
To communicate the plan of how multi-node claims would work,
as described by Roger, I'd suggest I'd add a design document
for multi-node claims, modelled after the Hyperlaunch design
document found in the docs.
Once that design is approved, we should have a clear shared
understanding of them before we'd be looking at implementation.
Bernhard
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |