|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] RE: [PATCH v4 05/10] xen/domain: Add DOMCTL handler for claiming memory with NUMA awareness
Hi Jan,
I'm sorry for the late reply, I had many TODOs and a vacation meanwhile,
but I'm back with the fixes for the review.
Jan Beulich wrote:
> > --- a/xen/common/domain.c
> > +++ b/xen/common/domain.c
[...]
> > +int claim_memory(struct domain *d, [...]
>
> static in domctl.c? Otherwise with Penny's work to make domctl optional this
> would be unreachable code.
Thanks, done: Moved it to domctl.c to be not compiled without MGMT_HYPERCALLS
in v5/v6.
> > + if ( uinfo->pad || uinfo->nr_claims != 1 || d->is_dying )
> > + return -EINVAL;
>
> As already alluded to in reply to patch 03, I can't help the impression that
> usage of this sub-op with multiple entries would we quite different (i.e. it
> would be not only the implementation in Xen that changes). I'm therefore
> pretty uncertain whether taking it with this restriction is going to make
> much sense.
I submitted this sub-op to support multiple entries with v5/v6 now.
In v5/v6 these checks are updated to support multiple claims in the claim set.
For clarity, I renamed the .node of the individual claim entries to .target:
The target of a claim entry can also be a selector for a global claim
or a legacy claim and the field have many bits for future use.
This wasn't needed but I think it's clearer that the claim entry specifies a
target which is where the claim entry is aimed at, it's not just only a node.
> + if ( claim.node == XEN_DOMCTL_CLAIM_MEMORY_NO_NODE )
> > + claim.node = NUMA_NO_NODE;
>
> What about the incoming claim.node being NUMA_NO_NODE? Imo the range checking
> the previous patch adds to domain_set_outstanding_pages() wants to move here,
> at which point the function's new parameter could be properly nodeid_t.
nodeid_t and NUMA_NO_NODE have (judging by the existing implementation) are not
exposed in the public API to the control domain.
This separation is probably a good thing because it allows to change Xen
internals
like nodeit_t and NUMA_NO_NODE if so desired without changing the public API.
NUMA_NO_NODE is defined as 0xFF and nodeid_t is u8. But that is just an
implementation detail of the Hypervisor itself. If needed, we could change
the implementation like this series could do, if wanted.
The public struct xen_sysctl_numainfo and xen_sysctl_physinfo define num_nodes,
nr_nodes and max_node_id as uint32_t, for example. For type consistency, I opted
to define this public API as uint32_t as well and not expose internal
types/values.
> > + return domain_set_outstanding_pages(d, claim.pages, claim.node);
> > +}
>
> There's no copying back of the result. When this is extended to allow more
> than one entry, what's the plan towards dealing with partial success? Needing
> to roll back may be unwieldy.
Roger described the core requirement I'm implementing:
> Ideally, we would need to introduce a new hypercall that allows
> making claims from multiple nodes in a single locked region,
> as to ensure success or failure in an atomic way.
-- Roger Pau Monné
Ref:
https://lists.xenproject.org/archives/html/xen-devel/2025-06/msg00484.html
As a result, we don't need to handle partial successes, so its not needed.
> > +#define XEN_DOMCTL_CLAIM_[...] 0xFFFFFFFF /* No node: host claim */
>
> "host claim" (in the comment) also is ambiguous. Per-node claims also affect
> the host. Maybe "host wide" or "global"?
Thanks for this suggestion! I changed the term used everywhere to "global" in
v5/6.
> > +/* Use XEN_NODE_CLAIM_INIT to initialize a memory_claim_t structure */
> > +#define XEN_NODE_CLAIM_INIT(_pages, _node) { \
> > + .pages = (_pages), \
> > + .node = (_node), \
> > + .pad = 0 \
> > +}
>
> While only a macro, it's still not C89, and hence may wants offering only as
> an extension. Also .pad doesn't need explicitly specifying, does it? If you
> provide such a macro, identifiers used also need to strictly conform to the
> C spec (IOW leading underscores aren't permitted).
Thanks, removed as not needed.
> > +DEFINE_XEN_GUEST_HANDLE(memory_claim_t);
>
> This wants to move up next to the typedef.
Thanks, done in v5/v6.
> > + /* IN: number of claims in the claims array handle. See the claims
>
> Is repeating the word "claim" necessary / useful here?
Thanks, fixed.
> > #define XEN_DOMCTL_get_domain_state 90 /* stable interface */
> > +#define XEN_DOMCTL_claim_memory 91
>
> Seeing the adjacent comment, did you consider making this new sub-op a stable
> one as well?
Thanks, I investigated making such change, but I don't think it should be
changed:
In short, XEN_DOMCTL_get_domain_state uses a fixed hypercall version of 0 and is
frozen because it needs to be used by a caller that must support multiple Xen
versions. Consequently, libxenctrl, using only the version controlled hypercalls
does not implement this hypercall.
That's not the designed use case of this hypercall:
The designed use is domain builders running in Dom0 which already
need to use the unstable (versioned) interfaces for building domains.
I think that calling this hypercall through libxenctrl like the other
hypercalls the domain builders suit it better. Otherwise, the domain builders
would use a mix of version-controlled and frozen/stable hypercalls, which could
be confusing for API users and for future maintenance.
From the domain builders’ viewpoint, it is more consistent to expose
the claims hypercall in the same way as the other calls they use.
From my viewpoint, such frozen interfaces also have drawbacks: By providing
stable syscalls, Linux needs to maintain the old interface indefinitely, which
can be a maintenance burden and can limit the ability to make improvements or
changes to the interface in the future. Linux carries many syscall successor
families, e.g., oldstat, stat, newstat, stat64, fstatat, statx, with similar
examples including openat, openat2, clone3, dup3, waitid, mmap2, epoll_create1,
pselect6 and many more. Glibc hides that complexity from users by providing a
consistent API, but it still needs to maintain the old system calls for
compatibility. Xen's interface for Dom0 is not an OS kernel syscall interface.
In contrast, the versioned libxenctrl hypercalls allow for more flexibility and
evolution of the API while still providing a clear path to adopt new features.
The reserved fields and reserved bits in the structures of this hypercall allow
for many future extensions without breaking existing callers.
Thanks for your review of the v4 series so far,
and I'm looking forward for everyone's reviews of the v6 series:
[PATCH v2] docs: Draft Design Document for NUMA-aware claim sets
https://lists.xen.org/archives/html/xen-devel/2026-04/msg00569.html
https://patchwork.kernel.org/project/xen-devel/list/?series=1081047
[PATCH v6 0/7] xen/mm: Introduce NUMA-aware claim sets for domains
https://lists.xen.org/archives/html/xen-devel/2026-04/msg00587.html
https://patchwork.kernel.org/project/xen-devel/list/?series=1081139
Thanks, Bernhard
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |