[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v3 07/15] argo: implement the register op



>>> On 07.01.19 at 08:42, <christopher.w.clark@xxxxxxxxx> wrote:
> --- a/xen/common/argo.c
> +++ b/xen/common/argo.c
> @@ -23,16 +23,41 @@
>  #include <xen/event.h>
>  #include <xen/domain_page.h>
>  #include <xen/guest_access.h>
> +#include <xen/lib.h>
> +#include <xen/nospec.h>
>  #include <xen/time.h>
>  #include <public/argo.h>
>  
> +#define MAX_RINGS_PER_DOMAIN            128U
> +
> +/* All messages on the ring are padded to a multiple of the slot size. */
> +#define ROUNDUP_MESSAGE(a) (ROUNDUP((a), XEN_ARGO_MSG_SLOT_SIZE))

Pointless outermost pair of parentheses.

> @@ -198,6 +223,31 @@ static DEFINE_RWLOCK(argo_lock); /* L1 */
>  #define argo_dprintk(format, ... ) ((void)0)
>  #endif
>  
> +/*
> + * This hash function is used to distribute rings within the per-domain
> + * hash tables (d->argo->ring_hash and d->argo_send_hash). The hash table
> + * will provide a struct if a match is found with a 'argo_ring_id' key:
> + * ie. the key is a (domain id, port, partner domain id) tuple.
> + * Since port number varies the most in expected use, and the Linux driver
> + * allocates at both the high and low ends, incorporate high and low bits to
> + * help with distribution.
> + * Apply array_index_nospec as a defensive measure since this operates
> + * on user-supplied input and the array size that it indexes into is known.
> + */
> +static unsigned int
> +hash_index(const struct argo_ring_id *id)
> +{
> +    unsigned int hash;
> +
> +    hash = (uint16_t)(id->port >> 16);
> +    hash ^= (uint16_t)id->port;

I may have asked this before, but are the casts really needed
with ...

> +    hash ^= id->domain_id;
> +    hash ^= id->partner_id;
> +    hash &= (ARGO_HTABLE_SIZE - 1);

... the masking done here?

> +    return array_index_nospec(hash, ARGO_HTABLE_SIZE);

With the masking above - is this really needed?

And then the question is whether the quality of the hash is
sufficient: There won't be more set bits in the result than
are in any of the three input values, so if they're all small,
higher hash table entries won't be used at all. I would
assume the goal to be that by the time 32 entities appear,
chances be good that at least about 30 of the hash table
entries are in use.

> @@ -219,6 +269,78 @@ ring_unmap(struct argo_ring_info *ring_info)
>      }
>  }
>  
> +static int
> +ring_map_page(struct argo_ring_info *ring_info, unsigned int i, void 
> **out_ptr)
> +{
> +    if ( i >= ring_info->nmfns )
> +    {
> +        gprintk(XENLOG_ERR,
> +               "argo: ring (vm%u:%x vm%d) %p attempted to map page  %u of 
> %u\n",

ring_info->id.{domain,partner}_id look to be of the same type -
why once %u and once %d? Same elsewhere.

> +                ring_info->id.domain_id, ring_info->id.port,
> +                ring_info->id.partner_id, ring_info, i, ring_info->nmfns);
> +        return -ENOMEM;
> +    }

    i = array_index_nospec(i, ring_info->nmfns);

considering the array indexes here? Of course at this point only
zero can be passed in, but I assume this changes in later patches
and the index is at least indirectly guest controlled.

> @@ -371,6 +493,418 @@ partner_rings_remove(struct domain *src_d)
>      }
>  }
>  
> +static int
> +find_ring_mfn(struct domain *d, gfn_t gfn, mfn_t *mfn)

So you have find_ring_mfn(), find_ring_mfns(), and ring_find_info().
Any chance you could use a consistent ordering of "ring" and "find"?
Or is there a reason behind the apparent mismatch?

> +{
> +    p2m_type_t p2mt;
> +    int ret = 0;
> +
> +#ifdef CONFIG_X86
> +    *mfn = get_gfn_unshare(d, gfn_x(gfn), &p2mt);
> +#else
> +    *mfn = p2m_lookup(d, gfn, &p2mt);
> +#endif
> +
> +    if ( !mfn_valid(*mfn) )
> +        ret = -EINVAL;
> +#ifdef CONFIG_X86
> +    else if ( p2m_is_paging(p2mt) || (p2mt == p2m_ram_logdirty) )
> +        ret = -EAGAIN;
> +#endif
> +    else if ( (p2mt != p2m_ram_rw) ||
> +              !get_page_and_type(mfn_to_page(*mfn), d, PGT_writable_page) )
> +        ret = -EINVAL;
> +
> +#ifdef CONFIG_X86
> +    put_gfn(d, gfn_x(gfn));
> +#endif
> +
> +    return ret;
> +}

Please check whether you could leverage check_get_page_from_gfn()
here. If you can't, please at least take inspiration as to e.g. the
#ifdef-s from that function.

> +static int
> +find_ring_mfns(struct domain *d, struct argo_ring_info *ring_info,
> +               uint32_t npage,
> +               XEN_GUEST_HANDLE_PARAM(xen_argo_page_descr_t) pg_descr_hnd,
> +               uint32_t len)

Noticing it here, but perhaps still an issue elsewhere as well: Didn't
we agree on removing unnecessary use of fixed width types? Or
was that in the context on an earlier patch of v3?

> +{
> +    unsigned int i;
> +    int ret = 0;
> +    mfn_t *mfns;
> +    uint8_t **mfn_mapping;
> +
> +    /*
> +     * first bounds check on npage here also serves as an overflow check
> +     * before left shifting it
> +     */
> +    if ( (unlikely(npage > (XEN_ARGO_MAX_RING_SIZE >> PAGE_SHIFT))) ||

Isn't this redundant with the check in do_argo_p()?

> +         ((npage << PAGE_SHIFT) < len) )
> +        return -EINVAL;
> +
> +    if ( ring_info->mfns )
> +    {
> +        /* Ring already existed: drop the previous mapping. */
> +        gprintk(XENLOG_INFO,
> +         "argo: vm%u re-register existing ring (vm%u:%x vm%d) clears 
> mapping\n",

Indentation (also elsewhere).

> +                d->domain_id, ring_info->id.domain_id,
> +                ring_info->id.port, ring_info->id.partner_id);
> +
> +        ring_remove_mfns(d, ring_info);
> +        ASSERT(!ring_info->mfns);
> +    }
> +
> +    mfns = xmalloc_array(mfn_t, npage);
> +    if ( !mfns )
> +        return -ENOMEM;
> +
> +    for ( i = 0; i < npage; i++ )
> +        mfns[i] = INVALID_MFN;
> +
> +    mfn_mapping = xzalloc_array(uint8_t *, npage);
> +    if ( !mfn_mapping )
> +    {
> +        xfree(mfns);
> +        return -ENOMEM;
> +    }
> +
> +    ring_info->npage = npage;
> +    ring_info->mfns = mfns;
> +    ring_info->mfn_mapping = mfn_mapping;

As the inverse to the cleanup sequence in an earlier patch: Please
set ->npage last here even if it doesn't strictly matter.

> +    ASSERT(ring_info->npage == npage);

What is this trying to make sure, seeing the assignment just a
few lines up?

> +    if ( ring_info->nmfns == ring_info->npage )
> +        return 0;

Can this happen with the ring_remove_mfns() call above?

> +    for ( i = ring_info->nmfns; i < ring_info->npage; i++ )

And hence can i start from other than zero here? And why not
use the (possibly cheaper to access) function argument "npage"
as the loop upper bound? The other similar loop a few lines up
is coded that simpler way.

> +static long
> +register_ring(struct domain *currd,
> +              XEN_GUEST_HANDLE_PARAM(xen_argo_register_ring_t) reg_hnd,
> +              XEN_GUEST_HANDLE_PARAM(xen_argo_page_descr_t) pg_descr_hnd,
> +              uint32_t npage, bool fail_exist)
> +{
> +    xen_argo_register_ring_t reg;
> +    struct argo_ring_id ring_id;
> +    void *map_ringp;
> +    xen_argo_ring_t *ringp;
> +    struct argo_ring_info *ring_info;
> +    struct argo_send_info *send_info = NULL;
> +    struct domain *dst_d = NULL;
> +    int ret = 0;
> +    uint32_t private_tx_ptr;
> +
> +    if ( copy_from_guest(&reg, reg_hnd, 1) )
> +    {
> +        ret = -EFAULT;
> +        goto out;
> +    }
> +
> +    /*
> +     * A ring must be large enough to transmit messages, so requires space 
> for:
> +     * * 1 message header, plus
> +     * * 1 payload slot (payload is always rounded to a multiple of 16 bytes)
> +     *   for the message payload to be written into, plus
> +     * * 1 more slot, so that the ring cannot be filled to capacity with a
> +     *   single message -- see the logic in ringbuf_insert -- allowing for 
> this
> +     *   ensures that there can be space remaining when a message is present.
> +     * The above determines the minimum acceptable ring size.
> +     */
> +    if ( (reg.len < (sizeof(struct xen_argo_ring_message_header)
> +                      + ROUNDUP_MESSAGE(1) + ROUNDUP_MESSAGE(1))) ||

These two summands don't look to fulfill the "cannot be filled to
capacity" constraint the comment describes, as (aiui) messages
can be larger than 16 bytes. What's the deal?

> +         (reg.len > XEN_ARGO_MAX_RING_SIZE) ||
> +         (reg.len != ROUNDUP_MESSAGE(reg.len)) ||
> +         (reg.pad != 0) )
> +    {
> +        ret = -EINVAL;
> +        goto out;
> +    }
> +
> +    ring_id.partner_id = reg.partner_id;
> +    ring_id.port = reg.port;
> +    ring_id.domain_id = currd->domain_id;
> +
> +    read_lock(&argo_lock);

From here to ...

> +    if ( !currd->argo )
> +    {
> +        ret = -ENODEV;
> +        goto out_unlock;
> +    }
> +
> +    if ( reg.partner_id == XEN_ARGO_DOMID_ANY )
> +    {
> +        if ( opt_argo_mac_enforcing )
> +        {
> +            ret = -EPERM;
> +            goto out_unlock;
> +        }
> +    }
> +    else
> +    {
> +        dst_d = get_domain_by_id(reg.partner_id);
> +        if ( !dst_d )
> +        {
> +            argo_dprintk("!dst_d, ESRCH\n");
> +            ret = -ESRCH;
> +            goto out_unlock;
> +        }
> +
> +        if ( !dst_d->argo )
> +        {
> +            argo_dprintk("!dst_d->argo, ECONNREFUSED\n");
> +            ret = -ECONNREFUSED;
> +            put_domain(dst_d);
> +            goto out_unlock;
> +        }
> +
> +        send_info = xzalloc(struct argo_send_info);
> +        if ( !send_info )
> +        {
> +            ret = -ENOMEM;
> +            put_domain(dst_d);
> +            goto out_unlock;
> +        }
> +        send_info->id = ring_id;
> +    }

... here, what exactly is it that requires the global read lock
to be held ...

> +    write_lock(&currd->argo->lock);

... prior to this? Holding locks around allocations is not
forbidden, but should be avoided whenever possible.

And then further why does the global read lock need
continued holding until the end of the function?

> +    if ( currd->argo->ring_count >= MAX_RINGS_PER_DOMAIN )
> +    {
> +        ret = -ENOSPC;
> +        goto out_unlock2;
> +    }
> +
> +    ring_info = ring_find_info(currd, &ring_id);
> +    if ( !ring_info )
> +    {
> +        ring_info = xzalloc(struct argo_ring_info);
> +        if ( !ring_info )
> +        {
> +            ret = -ENOMEM;
> +            goto out_unlock2;
> +        }
> +
> +        spin_lock_init(&ring_info->lock);
> +
> +        ring_info->id = ring_id;
> +        INIT_HLIST_HEAD(&ring_info->pending);
> +
> +        hlist_add_head(&ring_info->node,
> +                       &currd->argo->ring_hash[hash_index(&ring_info->id)]);
> +
> +        gprintk(XENLOG_DEBUG, "argo: vm%u registering ring (vm%u:%x vm%d)\n",
> +                currd->domain_id, ring_id.domain_id, ring_id.port,
> +                ring_id.partner_id);
> +    }
> +    else
> +    {
> +        if ( ring_info->len )
> +        {

Please fold into "else if ( )", removing a level of indentation.

> +            /*
> +             * If the caller specified that the ring must not already exist,
> +             * fail at attempt to add a completed ring which already exists.
> +             */
> +            if ( fail_exist )
> +            {
> +                argo_dprintk("disallowed reregistration of existing ring\n");
> +                ret = -EEXIST;
> +                goto out_unlock2;
> +            }
> +
> +            if ( ring_info->len != reg.len )
> +            {
> +                /*
> +                 * Change of ring size could result in entries on the pending
> +                 * notifications list that will never trigger.
> +                 * Simple blunt solution: disallow ring resize for now.
> +                 * TODO: investigate enabling ring resize.
> +                 */
> +                gprintk(XENLOG_ERR,
> +                    "argo: vm%u attempted to change ring size(vm%u:%x 
> vm%d)\n",
> +                        currd->domain_id, ring_id.domain_id, ring_id.port,
> +                        ring_id.partner_id);
> +                /*
> +                 * Could return EINVAL here, but if the ring didn't already
> +                 * exist then the arguments would have been valid, so: 
> EEXIST.
> +                 */
> +                ret = -EEXIST;
> +                goto out_unlock2;
> +            }
> +
> +            gprintk(XENLOG_DEBUG,
> +                    "argo: vm%u re-registering existing ring (vm%u:%x 
> vm%d)\n",
> +                    currd->domain_id, ring_id.domain_id, ring_id.port,
> +                    ring_id.partner_id);
> +        }
> +    }
> +
> +    ret = find_ring_mfns(currd, ring_info, npage, pg_descr_hnd, reg.len);
> +    if ( ret )
> +    {
> +        gprintk(XENLOG_ERR,
> +                "argo: vm%u failed to find ring mfns (vm%u:%x vm%d)\n",
> +                currd->domain_id, ring_id.domain_id, ring_id.port,
> +                ring_id.partner_id);
> +
> +        ring_remove_info(currd, ring_info);
> +        goto out_unlock2;
> +    }
> +
> +    /*
> +     * The first page of the memory supplied for the ring has the 
> xen_argo_ring
> +     * structure at its head, which is where the ring indexes reside.
> +     */
> +    ret = ring_map_page(ring_info, 0, &map_ringp);
> +    if ( ret )
> +    {
> +        gprintk(XENLOG_ERR,
> +                "argo: vm%u failed to map ring mfn 0 (vm%u:%x vm%d)\n",
> +                currd->domain_id, ring_id.domain_id, ring_id.port,
> +                ring_id.partner_id);
> +
> +        ring_remove_info(currd, ring_info);
> +        goto out_unlock2;
> +    }
> +    ringp = map_ringp;
> +
> +    private_tx_ptr = read_atomic(&ringp->tx_ptr);
> +
> +    if ( (private_tx_ptr >= reg.len) ||
> +         (ROUNDUP_MESSAGE(private_tx_ptr) != private_tx_ptr) )
> +    {
> +        /*
> +         * Since the ring is a mess, attempt to flush the contents of it
> +         * here by setting the tx_ptr to the next aligned message slot past
> +         * the latest rx_ptr we have observed. Handle ring wrap correctly.
> +         */
> +        private_tx_ptr = ROUNDUP_MESSAGE(read_atomic(&ringp->rx_ptr));
> +
> +        if ( private_tx_ptr >= reg.len )
> +            private_tx_ptr = 0;
> +
> +        update_tx_ptr(ring_info, private_tx_ptr);
> +    }
> +
> +    ring_info->tx_ptr = private_tx_ptr;
> +    ring_info->len = reg.len;
> +    currd->argo->ring_count++;
> +
> +    if ( send_info )
> +    {
> +        spin_lock(&dst_d->argo->send_lock);
> +
> +        hlist_add_head(&send_info->node,
> +                       &dst_d->argo->send_hash[hash_index(&send_info->id)]);
> +
> +        spin_unlock(&dst_d->argo->send_lock);
> +    }
> +
> + out_unlock2:
> +    if ( !ret && send_info )
> +        xfree(send_info);
> +
> +    if ( dst_d )
> +        put_domain(dst_d);
> +
> +    write_unlock(&currd->argo->lock);

Surely you can drop the lock before the other two cleanup
actions? That would then allow you to add another label to
absorb the two separate put_domain() calls on error paths.

> --- a/xen/include/asm-arm/guest_access.h
> +++ b/xen/include/asm-arm/guest_access.h
> @@ -29,6 +29,8 @@ int access_guest_memory_by_ipa(struct domain *d, paddr_t 
> ipa, void *buf,
>  /* Is the guest handle a NULL reference? */
>  #define guest_handle_is_null(hnd)        ((hnd).p == NULL)
>  
> +#define guest_handle_is_aligned(hnd, mask) (!((uintptr_t)(hnd).p & (mask)))

This is unused throughout the patch afaics.

> --- a/xen/include/public/argo.h
> +++ b/xen/include/public/argo.h
> @@ -31,6 +31,26 @@
>  
>  #include "xen.h"
>  
> +#define XEN_ARGO_DOMID_ANY       DOMID_INVALID
> +
> +/*
> + * The maximum size of an Argo ring is defined to be: 16GB
> + *  -- which is 0x1000000 bytes.
> + * A byte index into the ring is at most 24 bits.
> + */
> +#define XEN_ARGO_MAX_RING_SIZE  (0x1000000ULL)
> +
> +/*
> + * Page descriptor: encoding both page address and size in a 64-bit value.
> + * Intended to allow ABI to support use of different granularity pages.
> + * example of how to populate:
> + * xen_argo_page_descr_t pg_desc =
> + *      (physaddr & PAGE_MASK) | XEN_ARGO_PAGE_DESCR_SIZE_4K;
> + */
> +typedef uint64_t xen_argo_page_descr_t;
> +#define XEN_ARGO_PAGE_DESCR_SIZE_MASK   0x0000000000000fffULL
> +#define XEN_ARGO_PAGE_DESCR_SIZE_4K     0

Are the _DESCR_ infixes here really useful?

> @@ -56,4 +76,56 @@ typedef struct xen_argo_ring
>  #endif
>  } xen_argo_ring_t;
>  
> +typedef struct xen_argo_register_ring
> +{
> +    uint32_t port;
> +    domid_t partner_id;
> +    uint16_t pad;
> +    uint32_t len;
> +} xen_argo_register_ring_t;
> +
> +/* Messages on the ring are padded to a multiple of this size. */
> +#define XEN_ARGO_MSG_SLOT_SIZE 0x10
> +
> +struct xen_argo_ring_message_header
> +{
> +    uint32_t len;
> +    xen_argo_addr_t source;
> +    uint32_t message_type;
> +#if defined(__STDC_VERSION__) && __STDC_VERSION__ >= 199901L
> +    uint8_t data[];
> +#elif defined(__GNUC__)
> +    uint8_t data[0];
> +#endif
> +};
> +
> +/*
> + * Hypercall operations
> + */
> +
> +/*
> + * XEN_ARGO_OP_register_ring
> + *
> + * Register a ring using the indicated memory.
> + * Also used to reregister an existing ring (eg. after resume from 
> hibernate).
> + *
> + * arg1: XEN_GUEST_HANDLE(xen_argo_register_ring_t)
> + * arg2: XEN_GUEST_HANDLE(xen_argo_page_descr_t)
> + * arg3: unsigned long npages
> + * arg4: unsigned long flags

The "unsigned long"-s here are not necessarily compatible with
compat mode. At the very least flags above bit 31 won't be
usable by compat mode guests. Hence I also question ...

> + */
> +#define XEN_ARGO_OP_register_ring     1
> +
> +/* Register op flags */
> +/*
> + * Fail exist:
> + * If set, reject attempts to (re)register an existing established ring.
> + * If clear, reregistration occurs if the ring exists, with the new ring
> + * taking the place of the old, preserving tx_ptr if it remains valid.
> + */
> +#define XEN_ARGO_REGISTER_FLAG_FAIL_EXIST  0x1
> +
> +/* Mask for all defined flags. unsigned long type so ok for both 32/64-bit */
> +#define XEN_ARGO_REGISTER_FLAG_MASK 0x1UL

... the UL suffix here. Also this last item should not be exposed
(perhaps framed by "#ifdef __XEN__") and would perhaps anyway
better be defined in terms of the other
XEN_ARGO_REGISTER_FLAG_*.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.