[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v3 09/15] argo: implement the sendv op; evtchn: expose send_guest_global_virq



On Thu, Jan 10, 2019 at 4:10 AM Christopher Clark
<christopher.w.clark@xxxxxxxxx> wrote:
>
> Thanks for the review, Roger. Replies inline below.
>
> On Wed, Jan 9, 2019 at 10:57 AM Roger Pau Monné <royger@xxxxxxxxxxx> wrote:
> >
> >  to.On Mon, Jan 7, 2019 at 8:44 AM Christopher Clark
> > <christopher.w.clark@xxxxxxxxx> wrote:
> > >
> > > sendv operation is invoked to perform a synchronous send of buffers
> > > contained in iovs to a remote domain's registered ring.
> > >
> > > diff --git a/xen/common/argo.c b/xen/common/argo.c
> > > index 59ce8c4..4548435 100644
> > > --- a/xen/common/argo.c
> > > +++ b/xen/common/argo.c
>
> > >
> > > +static int
> > > +memcpy_to_guest_ring(struct argo_ring_info *ring_info, uint32_t offset,
> > > +                     const void *src, XEN_GUEST_HANDLE(uint8_t) src_hnd,
> > > +                     uint32_t len)
> > > +{
> > > +    unsigned int mfns_index = offset >> PAGE_SHIFT;
> > > +    void *dst;
> > > +    int ret;
> > > +    unsigned int src_offset = 0;
> > > +
> > > +    ASSERT(spin_is_locked(&ring_info->lock));
> > > +
> > > +    offset &= ~PAGE_MASK;
> > > +
> > > +    if ( (len > XEN_ARGO_MAX_RING_SIZE) || (offset > 
> > > XEN_ARGO_MAX_RING_SIZE) )
> > > +        return -EFAULT;
> > > +
> > > +    while ( (offset + len) > PAGE_SIZE )
> >
> > I think you could map the whole ring in contiguous virtual address
> > space and then writing to it would be much more easy, you wouldn't
> > need to iterate with memcpy or copy_from_guest, take a look at __vmap.
> > You could likely map this when the ring gets setup and keep it mapped
> > for the lifetime of the ring.
>
> You're right about that, because map_domain_page_global, which the
> current code uses, uses vmap itself. I think there's a couple of
> reasons why the code has been implemented the iterative way though.
>
> The first is that I think ring resize has been a consideration: it's
> useful to be able to increase the size of a live and active ring that
> is under load without having to tear down the mappings, find a new
> virtual address region of the right size and then remap it: you can
> just supply some more memory and map those pages onto the end of the
> ring, and ensure both sides know about the new ring size. Similarly,
> shrinking a quiet ring can be useful.

Is such on the fly expansion something common with argo?

I'm not saying it's something that shouldn't be supported, but the
burden of allowing such resizing doesn't seem trivial. You will have
to redimension a lot of the arrays used to store the pages used, and
at that point I wonder whether remapping the virtual address space is
really the biggest issue you are going to have if you allow such run
time resizing.

> However, the "gfn race" that you (correctly) pointed out in an earlier
> review, and Jan's related request to drop the "revalidate an existing
> mapping on ring reregister" motivated removal of a section of the code
> involved, and then in v3 of the series, I've actually just blocked
> ring resize because it's missing a walk through the pending
> notifications to find any that have become untriggerable with the new
> ring size when a ring is shrunk and I'd like to defer implementing
> that for now. So the ring resize reason is more of a consideration for
> a possible later version of Argo than the current one.
>
> The second reason is about avoiding exposing the Xen virtual memory
> allocator directly to frequent guest-supplied size requests for
> contiguous regions (of up to 16GB).

As said in another reply, I'm not sure allowing 16GB rings is safe.
The amount of internal memory required to track such rings is not
trivial given the arrays to store the mfns, the pages, and the virtual
mappings.

> With single-page allocations to
> build a ring, fragmentation is not a problem, and mischief by a guest
> seems difficult.

Hm, there's still a lot of big dynamic memory allocations in order to
support a 16GB ring, which makes me think that virtual address space
is not the only problem if you allow 16GB rings.

> Changing it to issue requests for contiguous regions,
> with variable ring sizes up to the maximum of 16GB, it seems like
> significant fragmentation may be achievable. I don't know the
> practical impact of that but it seems worth avoiding. Are the other
> users of __vmap (or vmap) for multi-gigabyte regions only either
> boot-time, infrequent operations (livepatch), or for actions by
> privileged (ie. somewhat trusted) domains (ioremap), or is it already
> a frequent operation somewhere else?

I haven't checked, but I would be quite surprised to find any vmap
usage with such size (16GB). Maybe someone more familiar with the mm
subsystem can provide some insight here.

> Given the context above, and Jason's simplification to the
> memcpy_to_guest_ring function, plus the imminent merge freeze
> deadline, and the understanding that this loop and the related data
> structures supporting it have been tested and are working, would it be
> acceptable to omit making this contiguous mapping change from this
> current series?

My opinion would be to just use vmap if it works, because that IMO
greatly simplifies the code by being able to have the whole ring
mapped at all the time. It would remove the iteration to copy
requests, and remove the usage of ring_map_page everywhere. That would
be my recommendation code-wise, but as said above someone more
familiar with the mm subsystem might have other opinion's about how to
deal with accesses to 16GB of guest memory, and indeed your iterative
solution might be the best approach.

> > > +static long
> > > +sendv(struct domain *src_d, const xen_argo_addr_t *src_addr,
> > > +      const xen_argo_addr_t *dst_addr,
> > > +      XEN_GUEST_HANDLE_PARAM(xen_argo_iov_t) iovs_hnd, unsigned long 
> > > niov,
> > > +      uint32_t message_type)
> > > +{
> > > +    struct domain *dst_d = NULL;
> > > +    struct argo_ring_id src_id;
> > > +    struct argo_ring_info *ring_info;
> > > +    int ret = 0;
> > > +    unsigned long len = 0;
> > > +
> > > +    ASSERT(src_d->domain_id == src_addr->domain_id);
> > > +
> > > +    argo_dprintk("sendv: (%d:%x)->(%d:%x) niov:%lu iov:%p type:%u\n",
> > > +                 src_addr->domain_id, src_addr->port,
> > > +                 dst_addr->domain_id, dst_addr->port,
> > > +                 niov, iovs_hnd.p, message_type);
> > > +
> > > +    read_lock(&argo_lock);
> > > +
> > > +    if ( !src_d->argo )
> > > +    {
> > > +        ret = -ENODEV;
> > > +        goto out_unlock;
> > > +    }
> > > +
> > > +    src_id.port = src_addr->port;
> > > +    src_id.domain_id = src_d->domain_id;
> > > +    src_id.partner_id = dst_addr->domain_id;
> > > +
> > > +    dst_d = get_domain_by_id(dst_addr->domain_id);
> > > +    if ( !dst_d )
> > > +    {
> > > +        argo_dprintk("!dst_d, ESRCH\n");
> > > +        ret = -ESRCH;
> > > +        goto out_unlock;
> > > +    }
> > > +
> > > +    if ( !dst_d->argo )
> > > +    {
> > > +        argo_dprintk("!dst_d->argo, ECONNREFUSED\n");
> > > +        ret = -ECONNREFUSED;
> > > +        goto out_unlock;
> >
> > The usage of out_unlock here and in the condition above is wrong,
> > since it will unconditionally call read_unlock(&argo_lock); which is
> > wrong here because the lock has not yet been acquired.
>
> Sorry, I don't think that's quite right -- if you scroll up a bit
> here, you can see where argo_lock is taken unconditionally, just after
> the dprintk and before checking whether src_d is argo enabled. The
> second lock hasn't been taken yet - but that's not the one being
> unlocked on that out_unlock path.

Ops, yes, sorry. Got messed up with so many locks.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.