|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v5 12/16] xen: implement new foreign copy hypercall
On Mon, 15 Jun 2026 at 15:03, Jan Beulich <jbeulich@xxxxxxxx> wrote:
>
> On 15.06.2026 14:11, Frediano Ziglio wrote:
> > On Mon, 15 Jun 2026 at 08:41, Jan Beulich <jbeulich@xxxxxxxx> wrote:
> >> On 13.06.2026 23:47, Frediano Ziglio wrote:
> >>> --- a/xen/common/memory.c
> >>> +++ b/xen/common/memory.c
> >>> @@ -1545,6 +1545,133 @@ static int acquire_resource(
> >>> return rc;
> >>> }
> >>>
> >>> +/*
> >>> + * The "noinline" qualifier avoids the compiler to create a large
> >>> function
> >>> + * consuming quite a lot of stack.
> >>> + */
> >>> +static int noinline mem_foreigncopy(
> >>> + XEN_GUEST_HANDLE_PARAM(xen_foreigncopy_t) arg)
> >>> +{
> >>> + struct domain *d, *const currd = current->domain;
> >>> + xen_foreigncopy_t copy;
> >>> + int rc, direction;
> >>> +
> >>> + if ( copy_from_guest(©, arg, 1) )
> >>> + return -EFAULT;
> >>> +
> >>> + if ( copy.flags & ~1U )
> >>> + return -EINVAL;
> >>> +
> >>> + direction = copy.flags & XENMEM_foreigncopy_direction;
> >>> +
> >>> + if ( copy.nr_frames == 0 )
> >>> + return 0;
> >>> +
> >>> + rc = rcu_lock_remote_domain_by_id(copy.domid, &d);
> >>> + if ( rc )
> >>> + return rc;
> >>
> >> Extending my v4 remark: How about a "fully foreign" copy? I.e. one between
> >> two
> >> pages in two foreign domains? Getting me back also to the un-answered v4
> >> question of mine as to MMUEXT_COPY_PAGE.
> >
> > I really thought I replied to this but I cannot find it.
> > MMUEXT_COPY_PAGE was the first attempt to do this but mmuext is
> > designed for PV and extending was made the code confusing.
>
> We discussed this on the x86 call, and yes - the major op being PV-only is
> getting in the way here.
>
Added a comment in the commit message:
Extending MMUEXT_COPY_PAGE seems better on first sight but considering
that MMUEXT is meant for PV only and trying to change that sub-op this
solution is better.
> >> Further, as to the order of checks: I'm not going to insist on
> >> re-ordering, yet
> >> I'd like to point out that copying 0 pages to/from a bad domid might better
> >> yield an error.
> >
> > Not strong about this, changed to return -EINVAL. Reordering after
> > this change won't make much sense, -EINVAL is returned both for wrong
> > flags or no frames.
>
> Please don't - copying 0 frames with all other arguments correct is not an
> error.
>
This comment seems to contradict the not insisting on re-ordering.
Changed the order so to check the domid before nr_frames == 0 :
rc = rcu_lock_remote_domain_by_id(copy.domid, &d);
if ( rc )
return rc;
if ( copy.nr_frames == 0 )
{
rcu_unlock_domain(d);
return 0;
}
> >>> + do {
> >>> + /*
> >>> + * Arbitrary size. Not too much stack space, and a reasonable
> >>> stride
> >>> + * for continuation checks.
> >>> + */
> >>> + xen_pfn_t gfn_list[32];
> >>> + unsigned int todo = MIN(ARRAY_SIZE(gfn_list), copy.nr_frames);
> >>> +
> >>> + rc = -EFAULT;
> >>> + if ( copy_from_guest(gfn_list, copy.frame_list, todo) )
> >>> + goto out;
> >>> +
> >>> + for ( unsigned i = 0; i < todo; i++ )
> >>
> >> Nit: "unsigned int" please (like you have it above).
> >
> > Changed. Note that "long" is also not a type but a modifier. Would it
> > be good to change both the above to "unsigned" instead for consistency
> > with "long" ?
>
> No, the way we spell types has historical reasons and isn't tied to the
> exact lexical meaning of the keywords.
>
> >>> + {
> >>> + struct page_info *foreign_page;
> >>> + void *foreign;
> >>> + p2m_type_t p2mt;
> >>> + const unsigned long valid_mask =
> >>> +#ifdef CONFIG_X86
> >>> + p2m_to_mask(p2m_ram_rw) | p2m_to_mask(p2m_ram_logdirty);
> >>> +#else
> >>> + p2m_to_mask(p2m_ram_rw);
> >>> +#endif
> >>
> >> What about, for example, p2m_ram_ro? Or p2m_ram_shared? Or p2m_grant_map_*?
> >> Etc. Any artificial constraining wants justifying in the description and/or
> >> mentioning in the public header.
> >
> > The base of this was taken from migration code where there is such a check.
> > I suppose that adding p2m_ram_ro (where available) won't hurt.
>
> Just to mention, to avoid another round trip just because of this: p2m_ram_ro
> has different meaning on x86 vs Arm/RISC-V.
>
That's confusing... should not this be fixed somehow?
It won't save much from a round-trip. Should I allow it or not ?
> > p2m_ram_shared I'm not sure but seems fine too.
> > For p2m_grant_map_* it feels a bit a security issue to me. It would
> > allow a guest to give access to pages of other domains. It's true that
> > the current domain would have to have write access to this domain
> > anyway but extend these permissions sounds something it should not be
> > able to do.
>
> It could copy the contents of the grant mapped page by other means. Why not
> allow it in this new sub-op as well then?
>
I'm more afraid of writing the content of the grant pages than copying it.
> Talking of security: When the page you copy to is owned by a PV guest, I
> think you further need to obtain a PGT_writable type ref. (Of course it then
> likely is easier to always do this, not just for PV.)
>
Wondering how save/restore (or migration) works in this case.
> >>> @@ -2012,6 +2139,13 @@ long do_memory_op(unsigned long cmd,
> >>> XEN_GUEST_HANDLE_PARAM(void) arg)
> >>> start_extent);
> >>> break;
> >>>
> >>> + case XENMEM_foreigncopy:
> >>> + if ( unlikely(start_extent) )
> >>> + return -EINVAL;
> >>
> >> Please address review comments (verbally or by code changes) before
> >> submitting
> >> a new version. Here I had asked "Why make this different from other
> >> continuable
> >> sub-ops?"
> >>
> >
> > There's already a comment in the same file for similar reason
> >
> > /*
> > * Limiting nr_frames at (UINT_MAX >> MEMOP_EXTENT_SHIFT) isn't ideal.
> > If
> > * it ever becomes a practical problem, we can switch to mutating
> > * xmar.{frame,nr_frames,frame_list} in guest memory.
> > */
> >
> > so to avoid the doubt and possible future change I mutate the structure.
> > Also I use the mutation to give more information to the caller, using
> > "start_entent" won't allow this.
>
> You'll want to mention this in the description and/or a code comment. It
> wants to become clear that the inconsistency in behavior (with other sub-
> ops) is deliberate rather than accidental.
>
Added a comment in the code:
if ( copy.nr_frames && hypercall_preempt_check() )
{
/*
* Instead of using "start_extent" we update the structure back,
* we update it back in anyway to tell caller were the copy
* stopped.
*/
rc = hypercall_create_continuation(
__HYPERVISOR_memory_op, "lh", XENMEM_foreigncopy, arg);
goto out;
}
> >>> --- a/xen/include/public/memory.h
> >>> +++ b/xen/include/public/memory.h
> >>> @@ -740,7 +740,45 @@ struct xen_vnuma_topology_info {
> >>> typedef struct xen_vnuma_topology_info xen_vnuma_topology_info_t;
> >>> DEFINE_XEN_GUEST_HANDLE(xen_vnuma_topology_info_t);
> >>>
> >>> -/* Next available subop number is 29 */
> >>> +/*
> >>> + * Copy memory from/to a given domain.
> >>> + */
> >>> +#define XENMEM_foreigncopy 29
> >>> +struct xen_foreigncopy {
> >>> + /* IN - The domain whose resource is to be copied. */
> >>
> >> There's still "resource" here, when this really is about RAM (memory) only,
> >> not any other kind of resource.
> >>
> >>> + domid_t domid;
> >>> +
> >>> + /* IN - Flags. */
> >>> +#define XENMEM_foreigncopy_from 0
> >>> +#define XENMEM_foreigncopy_to 1
> >>> +#define XENMEM_foreigncopy_direction 1
> >>> + uint16_t flags;
> >>> +
> >>> + /*
> >>> + * IN
> >>> + *
> >>> + * As an IN parameter number of frames of the domain to be copied.
> >>> + */
> >>> + uint32_t nr_frames;
> >>
> >> This isn't just an input, as you update the field (and the handles below).
> >> This property of fields wants reflecting here, so callers know that they
> >> (a)
> >> can't re-use the struct on a subsequent call without re-initializing the
> >> fields which may have changed, and (b) can't put the struct in r/o memory.
> >>
> >
> > Update comments:
> >
> > /*
> > * Copy memory from/to a given domain.
> > */
> > #define XENMEM_foreigncopy 29
> > struct xen_foreigncopy {
> > /* IN - The domain whose memory is to be copied. */
> > domid_t domid;
> >
> > /* IN - Flags. */
> > #define XENMEM_foreigncopy_from 0
> > #define XENMEM_foreigncopy_to 1
> > #define XENMEM_foreigncopy_direction 1
> > uint16_t flags;
> >
> > /*
> > * IN/OUT
> > *
> > * As an IN parameter number of frames of the domain to be copied.
> > * On output on error updated number of frames left.
> > */
> > uint32_t nr_frames;
>
> This is updated not only on error, but also when encoding continuations.
>
Yes, but this seems more an implementation detail to me. I don't think
the caller cares about how the continuation is implemented.
> >>> + /*
> >>> + * IN
> >>> + *
> >>> + * Frames to be copied.
> >>> + */
> >>> + XEN_GUEST_HANDLE(xen_pfn_t) frame_list;
> >>> +
> >>> + /*
> >>> + * IN/OUT
> >>> + *
> >>> + * Userspace buffer to read/write from.
> >>> + */
> >>> + XEN_GUEST_HANDLE(uint8) buffer;
> >>
> >> With these two handles, there continues to be a need to (explicitly) deal
> >> with the compat case as well.
> >
> > I don't agree with this. Domains having access to other domains are
> > limited (like stub domains for Qemu) and won't be 32 bits today so why
> > allow 32 bits guests if not ever used?
>
> How do you know? Why shouldn't e.g. XTF be permitted to test this in all
> possible modes? And even if all arguments end up in favor of "no compat
> support", this then wants spelling out to make clear this wasn't an
> oversight, but rather a conscious decision.
>
XTF can do something like
#if COMPAT_GUEST
/* Compat guests are not supported, return success. */
return 0;
# endif
(or can be done in the Makefiles I suppose).
Added a comment in memory.h:
/*
* Copy memory from/to a given domain.
* As this call requires target access and guest with target access won't be
* compat guests supported for compat guests this is not implemented.
*/
#define XENMEM_foreigncopy 29
struct xen_foreigncopy {
> Jan
Frediano
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |