[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v5 4/4] xen: introduce XENMEM_exchange_and_pin and XENMEM_unpin



>>> On 09.09.13 at 18:06, Stefano Stabellini <stefano.stabellini@xxxxxxxxxxxxx> 
>>> wrote:
> +static long memory_exchange(int op, 
> XEN_GUEST_HANDLE_PARAM(xen_memory_exchange_t) arg)
>  {
>      struct xen_memory_exchange exch;
>      PAGE_LIST_HEAD(in_chunk_list);
>      PAGE_LIST_HEAD(out_chunk_list);
>      unsigned long in_chunk_order, out_chunk_order;
>      xen_pfn_t     gpfn, gmfn, mfn;
> -    unsigned long i, j, k = 0; /* gcc ... */
> +    unsigned long i = 0, j = 0, k = 0; /* gcc ... */

This is unnecessary afaict.

> @@ -542,54 +542,72 @@ static long 
> memory_exchange(XEN_GUEST_HANDLE_PARAM(xen_memory_exchange_t) arg)
>          /* Assign each output page to the domain. */
>          for ( j = 0; (page = page_list_remove_head(&out_chunk_list)); ++j )
>          {
> -            if ( assign_pages(d, page, exch.out.extent_order,
> -                              MEMF_no_refcount) )
> -            {
> -                unsigned long dec_count;
> -                bool_t drop_dom_ref;
> -
> -                /*
> -                 * Pages in in_chunk_list is stolen without
> -                 * decreasing the tot_pages. If the domain is dying when
> -                 * assign pages, we need decrease the count. For those pages
> -                 * that has been assigned, it should be covered by
> -                 * domain_relinquish_resources().
> -                 */
> -                dec_count = (((1UL << exch.in.extent_order) *
> -                              (1UL << in_chunk_order)) -
> -                             (j * (1UL << exch.out.extent_order)));
> -
> -                spin_lock(&d->page_alloc_lock);
> -                domain_adjust_tot_pages(d, -dec_count);
> -                drop_dom_ref = (dec_count && !d->tot_pages);
> -                spin_unlock(&d->page_alloc_lock);
> -
> -                if ( drop_dom_ref )
> -                    put_domain(d);
> -
> -                free_domheap_pages(page, exch.out.extent_order);
> -                goto dying;
> -            }
> +            unsigned long dec_count;
> +            bool_t drop_dom_ref;
>  
>              if ( __copy_from_guest_offset(&gpfn, exch.out.extent_start,
>                                            (i << out_chunk_order) + j, 1) )
>              {
>                  rc = -EFAULT;
> -                continue;
> +                goto extent_error;

No, nothing was undone in this case before. And _if_ the
re-ordering is provably correct (which I doubt at this point), you'd
need to continue to assign_pages() here to retain previous
behavior. Or if you think current behavior is wrong, please send
a (perhaps prerequisite) separate patch to address that, which
would at once make sure we get a proper explanation of what
you think is wrong.

>              }
>  
>              mfn = page_to_mfn(page);
>              guest_physmap_add_page(d, gpfn, mfn, exch.out.extent_order);

As Ian already pointed out, there's a window now where the
guest has the new page(s) in its physmap, but not assigned to it
yet.

>  
> +            if ( op == XENMEM_exchange_and_pin )
> +            {
> +                if ( guest_physmap_pin_range(d, gpfn, exch.out.extent_order) 
> )
> +                {
> +                    rc = -EFAULT;
> +                    goto extent_error_physmap;
> +                }
> +            }
> +
>              if ( !paging_mode_translate(d) )
>              {
>                  for ( k = 0; k < (1UL << exch.out.extent_order); k++ )
>                      set_gpfn_from_mfn(mfn + k, gpfn + k);

Again, _if_ the error path changes are provably correct, wouldn't
this need undoing on the error path too? That wasn't necessary
before your change because we ran the handling of the current
extent to completion.

> +            }
> +
> +            rc = assign_pages(d, page, exch.out.extent_order, 
> MEMF_no_refcount);
> +            if ( rc )
> +                goto extent_error_physmap;
> +
> +            if ( op == XENMEM_exchange_and_pin || !paging_mode_translate(d) 
> )
> +            {
>                  if ( __copy_to_guest_offset(exch.out.extent_start,
>                                              (i << out_chunk_order) + j,
>                                              &mfn, 1) )
>                      rc = -EFAULT;
>              }
> +
> +            continue;
> +
> +extent_error_physmap:
> +            guest_physmap_remove_page(d, gpfn, mfn, exch.out.extent_order);
> +extent_error:
> +            /*
> +             * Pages in in_chunk_list is stolen without
> +             * decreasing the tot_pages. If the domain is dying when
> +             * assign pages, we need decrease the count. For those pages
> +             * that has been assigned, it should be covered by
> +             * domain_relinquish_resources().
> +             */
> +            dec_count = (((1UL << exch.in.extent_order) *
> +                        (1UL << in_chunk_order)) -
> +                    (j * (1UL << exch.out.extent_order)));
> +
> +            spin_lock(&d->page_alloc_lock);
> +            domain_adjust_tot_pages(d, -dec_count);
> +            drop_dom_ref = (dec_count && !d->tot_pages);
> +            spin_unlock(&d->page_alloc_lock);
> +
> +            if ( drop_dom_ref )
> +                put_domain(d);
> +
> +            free_domheap_pages(page, exch.out.extent_order);
> +            goto dying;

I don't think moving the error handling here helps readability or
such.

> +static long unpin(XEN_GUEST_HANDLE_PARAM(xen_unpin_t) arg)
> +{
> +    int rc;

Please keep function return type and return code variable type
in sync - "int" would be sufficient for both here.

> +    unsigned long i;
> +    struct xen_unpin unpin;
> +    xen_pfn_t gpfn;
> +    struct domain *d;
> +
> +    if ( copy_from_guest(&unpin, arg, 1) )
> +        return -EFAULT;
> +
> +    /* Various sanity checks. */
> +    if ( /* Extent orders are sensible? */
> +         (unpin.in.extent_order > MAX_ORDER) ||
> +         /* Sizes of input list do not overflow a long? */
> +         ((~0UL >> unpin.in.extent_order) < unpin.in.nr_extents) )
> +        return -EFAULT;
> +
> +    if ( !guest_handle_okay(unpin.in.extent_start, unpin.in.nr_extents) )
> +        return -EFAULT;
> +
> +    d = rcu_lock_domain_by_any_id(unpin.in.domid);
> +    if ( d == NULL )
> +    {
> +        rc = -ESRCH;
> +        goto fail;
> +    }
> +
> +    for ( i = 0; i < unpin.in.nr_extents; i++ )
> +    {
> +        if ( unlikely(__copy_from_guest_offset(
> +                      &gpfn, unpin.in.extent_start, i, 1)) )
> +        {
> +            rc = -EFAULT;
> +            goto partial;
> +        }
> +

Want/need to make sure gpfn is suitable aligned for the passed in
extent order? Depends on whether guest_physmap_unpin_range()
is expected to always cope with a mis-aligned one.

> --- a/xen/include/public/memory.h
> +++ b/xen/include/public/memory.h
> @@ -105,6 +105,7 @@ struct xen_memory_exchange {
>      /*
>       * [IN] Details of memory extents to be exchanged (GMFN bases).
>       * Note that @in.address_bits is ignored and unused.
> +     * @out.address_bits should contain the address mask for the new pages.

"mask"? No comment is likely better than a confusing one.

> @@ -459,6 +460,52 @@ DEFINE_XEN_GUEST_HANDLE(xen_mem_sharing_op_t);
>   * The zero value is appropiate.
>   */
>  
> +#define XENMEM_exchange_and_pin             26
> +/*
> + * This hypercall is similar to XENMEM_exchange: it takes the same
> + * struct as an argument and it exchanges the pages passed in with a new
> + * set of pages. The new pages are going to be "pinned": it's guaranteed
> + * that their p2m mapping won't be changed until explicitly "unpinned".
> + * The content of the exchanged pages is lost.
> + * Only normal guest r/w memory can be pinned: no granted pages or
> + * ballooned pages.
> + * If return code is zero then @out.extent_list provides the DMA frame
> + * numbers of the newly-allocated memory.

"DMA"? I don't think that term is universally true across all possible
architectures (but we're in an architecture independent header
here). "Machine" would probably be better (as it implies CPU
perspective, whereas DMA hints at device perspective).

> + * Returns zero on complete success, otherwise a negative error code:
> + *   -ENOSYS if not implemented
> + *   -EINVAL if the page is already pinned
> + *   -EFAULT if an internal error occurs

I'm not sure enumerating error codes here is useful; certainly not
giving the impression that the list might be exhaustive.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.