[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v6 04/10] xen: Introduce XENMEM_soft_reset operation

To: "Vitaly Kuznetsov" <vkuznets@xxxxxxxxxx>
From: "Jan Beulich" <JBeulich@xxxxxxxx>
Date: Fri, 22 May 2015 10:38:59 +0100
Cc: Olaf Hering <olaf@xxxxxxxxx>, Wei Liu <wei.liu2@xxxxxxxxxx>, Ian Campbell <ian.campbell@xxxxxxxxxx>, Stefano Stabellini <stefano.stabellini@xxxxxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Julien Grall <julien.grall@xxxxxxxxxx>, Ian Jackson <ian.jackson@xxxxxxxxxxxxx>, Andrew Jones <drjones@xxxxxxxxxx>, Tim Deegan <tim@xxxxxxx>, David Vrabel <david.vrabel@xxxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx, Daniel De Graaf <dgdegra@xxxxxxxxxxxxx>, Keir Fraser <keir@xxxxxxx>
Delivery-date: Fri, 22 May 2015 09:39:09 +0000
List-id: Xen developer discussion <xen-devel.lists.xen.org>

>>> On 13.05.15 at 11:49, <vkuznets@xxxxxxxxxx> wrote:
> --- a/xen/common/memory.c
> +++ b/xen/common/memory.c
> @@ -580,6 +580,234 @@ static long 
> memory_exchange(XEN_GUEST_HANDLE_PARAM(xen_memory_exchange_t) arg)
>      return rc;
>  }
>  
> +static long 
> memory_soft_reset(XEN_GUEST_HANDLE_PARAM(xen_memory_soft_reset_t) arg)
> +{
> +    long rc = 0;

The way it's coded right now, I don't see the need for this or the
function return type being "long" (but see the comments on the
public header change).

> +    struct xen_memory_soft_reset softr;
> +    struct domain *source_d, *dest_d;
> +    unsigned long mfn, mfn_new, gmfn, last_gmfn, count;
> +    unsigned int order;
> +    p2m_type_t p2mt;
> +    struct page_info *page, *new_page, *next_page;
> +    int drop_dom_ref;
> +
> +    if ( copy_from_guest(&softr, arg, 1) )
> +        return -EFAULT;
> +
> +    if ( softr.source_domid == softr.dest_domid )
> +        return -EINVAL;
> +
> +    source_d = rcu_lock_domain_by_any_id(softr.source_domid);
> +    if ( source_d == NULL )
> +    {
> +        rc = -ESRCH;
> +        goto fail_early;
> +    }
> +
> +    dest_d = rcu_lock_domain_by_any_id(softr.dest_domid);
> +    if ( dest_d == NULL )
> +    {
> +        rc = -ESRCH;
> +        rcu_unlock_domain(source_d);
> +        goto fail_early;
> +    }
> +
> +    if ( dest_d->is_dying )
> +    {
> +        rc = -EINVAL;
> +        goto fail;
> +    }
> +
> +    if ( !source_d->is_dying )
> +    {
> +        /*
> +         * Make sure no allocation/remapping for the source domain is ongoing
> +         * and set is_dying flag to prevent such actions in future.
> +         */
> +        spin_lock(&source_d->page_alloc_lock);
> +        source_d->is_dying = DOMDYING_locked;

You need to repeat the is_dying check with the lock held, and act
accordingly if it isn't zero anymore.

Furthermore I don't see how this prevents there being further
changes to the GFN <-> MFN mappings for the guest (yet you rely
on them to be stable below afaict).

> +        spin_unlock(&source_d->page_alloc_lock);
> +    }
> +
> +    last_gmfn = domain_get_maximum_gpfn(source_d);
> +    gmfn = softr.gmfn_start;
> +    while ( gmfn <= last_gmfn )

By now you should have done a privilege check.

> +    {
> +        page = get_page_from_gfn(source_d, gmfn, &p2mt, 0);
> +        if ( unlikely(page == NULL) )
> +        {
> +#ifdef CONFIG_X86
> +            count = 0;
> +            while ( p2m_is_pod(p2mt) )
> +            {
> +                count++;
> +                if ( gmfn + count > last_gmfn )
> +                    break;
> +                page = get_page_from_gfn(source_d, gmfn + count, &p2mt, 0);
> +                if ( page )
> +                {
> +                    put_page(page);
> +                    page = NULL;

Why? You don't use the value further down.

> +                    break;
> +                }
> +            }
> +
> +            if ( !count )
> +                gmfn++;
> +
> +            while ( count )
> +            {
> +                order = get_order_from_pages(count);
> +                if ( (1ul << order) > count )
> +                    order -= 1;

--

> +                rc = guest_physmap_mark_populate_on_demand(dest_d, gmfn, 
> order);
> +                if ( rc )
> +                    goto fail;
> +                count -= 1ul << order;
> +                gmfn += 1ul << order;
> +            }
> +#else
> +            gmfn++;
> +#endif
> +            goto preempt_check;
> +        }
> +
> +        mfn = page_to_mfn(page);
> +        if ( unlikely(!mfn_valid(mfn)) )
> +        {
> +            put_page(page);
> +            gmfn++;
> +            goto preempt_check;

I guess this should never happen, and hence issuing a warning may be
warranted?

> +        }
> +
> +        next_page = page;
> +
> +        /*
> +         * A normal page is supposed to have count_info = 2 (1 from the 
> domain
> +         * and 1 from get_page_from_gfn() above). All other pages need to be
> +         * copied.
> +         */

I'm afraid this isn't precise enough: _PGC_allocated implies one
refcount, i.e. if that flag turns out to be clear, you should expect
just 1. And then I'd wonder whether a !_PGC_allocated page
needs any handling here in the first place (or perhaps, once the
GFN <-> MFN mapping is ensured to be stable, this case can't
happen anyway).

> +        count = 0;
> +        while ( next_page && !is_xen_heap_page(next_page) &&
> +                (next_page->count_info & PGC_count_mask) == 2 &&
> +                page_to_mfn(next_page) == mfn + count )
> +        {
> +            count++;
> +            drop_dom_ref = 0;
> +            spin_lock(&source_d->page_alloc_lock);
> +            page_set_owner(next_page, NULL);
> +            page_list_del(next_page, &source_d->page_list);
> +            source_d->tot_pages -= 1;

--

> +            if ( unlikely(source_d->tot_pages == 0) )
> +                drop_dom_ref = 1;
> +            spin_unlock(&source_d->page_alloc_lock);
> +            put_page(next_page);
> +            if ( drop_dom_ref )
> +                put_domain(source_d);
> +
> +            if ( unlikely(assign_pages(dest_d, next_page, 0, 0)) )
> +            {
> +                printk(XENLOG_G_INFO "Failed to assign Dom%d's MFN %lx"
> +                       " to Dom%d\n", source_d->domain_id, mfn,
> +                       dest_d->domain_id);
> +                rc = -EFAULT;

A better (more distinct) error code should be used here. And I
suppose printing the GFN would be more helpful for debugging
possible issues than printing the MFN.

> +                goto fail;
> +            }
> +
> +            if ( unlikely(gmfn + count > last_gmfn) )
> +            {
> +                next_page = NULL;
> +                break;
> +            }
> +
> +            next_page = get_page_from_gfn(source_d, gmfn + count, &p2mt, 0);
> +        }
> +
> +        if ( next_page && count )
> +                put_page(next_page);
> +
> +        if ( !count && (page->count_info & PGC_count_mask) != 2 )

Re-reading the field seems wrong here: What if it was > 2 above,
but is == 2 now?

Also together with the "else if" below I think this would better be
structured

        if ( count )
        {
            if ( next_page && count )
                put_page(next_page);
        }
        else if ( (page->count_info & PGC_count_mask) != 2 )
        ...
        else

> +        {
> +            new_page = alloc_domheap_page(dest_d, 0);
> +            if ( unlikely(new_page == NULL) )
> +            {
> +                printk(XENLOG_G_INFO "Failed to alloc a page to replace"
> +                       " Dom%d's GFN %lx (MFN %lx) for Dom %d\n",
> +                       source_d->domain_id, gmfn, mfn, dest_d->domain_id);
> +                rc = -ENOMEM;
> +                put_page(page);
> +                goto fail;
> +            }
> +            mfn_new = page_to_mfn(new_page);
> +            copy_domain_page(mfn_new, mfn);
> +            mfn = mfn_new;
> +            put_page(page);
> +            if ( guest_physmap_add_page(dest_d, gmfn, mfn, 0) )
> +            {
> +                printk(XENLOG_G_INFO "Failed to add new GFN %lx"
> +                       " (MFN %lx) to Dom%d\n",
> +                       gmfn, mfn, dest_d->domain_id);
> +                rc = -EFAULT;
> +                goto fail;
> +            }
> +            gmfn++;
> +            softr.nr_transferred++;
> +        }
> +        else if ( !count )
> +        {
> +            put_page(page);
> +            gmfn++;
> +            goto preempt_check;
> +        }
> +
> +        while ( count )
> +        {
> +            order = get_order_from_pages(count);
> +            if ( (1ul << order) > count )
> +                order -= 1;
> +
> +            guest_physmap_remove_page(source_d, gmfn, mfn, order);
> +
> +            if ( guest_physmap_add_page(dest_d, gmfn, mfn, order) )

Considering the non-atomicity between the assign_pages() much
further up and this call, shouldn't the domain be paused while its
state is inconsistent?

> --- a/xen/include/public/memory.h
> +++ b/xen/include/public/memory.h
> @@ -573,7 +573,39 @@ struct xen_vnuma_topology_info {
>  typedef struct xen_vnuma_topology_info xen_vnuma_topology_info_t;
>  DEFINE_XEN_GUEST_HANDLE(xen_vnuma_topology_info_t);
>  
> -/* Next available subop number is 27 */
> +/*
> + * Transfer all memory pages from one domain to the other. Pages are unmapped
> + * from the source domain and mapped at the same GFNs to the destination
> + * domain. This hypercall has a side-effect of moving the source domain to 
> the
> + * 'dying' state.

Namely due to this, shouldn't this rather be a domctl?

> + * If a particular page is mapped more then once in the source domain a new
> + * empty page is being allocated for the destination domain and the content 
> is
> + * being copied. The original page remains mapped to the source domain.
> + *
> + * The caller has to be priviliged and it is supposed to set gmfn_start to 0,
> + * this field is required for the hypercall continuation.

If it was kept to be a mem-op - mem-ops have a mechanism to be
continued without custom structure fields.

> +#define XENMEM_soft_reset                   27
> +struct xen_memory_soft_reset {
> +    /*
> +     * [IN] Memory transfer details.
> +     */
> +    domid_t source_domid; /* steal pages from */
> +    domid_t dest_domid;   /* assign pages to */
> +
> +    xen_pfn_t gmfn_start; /* start from gmfn */
> +
> +    /*
> +     * [OUT] Number of transfered pages including new allocations.
> +     */
> +    xen_ulong_t nr_transferred;

It's not really clear to me what use this is.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

Follow-Ups:
- Re: [Xen-devel] [PATCH v6 04/10] xen: Introduce XENMEM_soft_reset operation
  - From: Vitaly Kuznetsov

References:
- [Xen-devel] [PATCH v6 00/10] toolstack-based approach to pvhvm guest kexec
  - From: Vitaly Kuznetsov
- [Xen-devel] [PATCH v6 04/10] xen: Introduce XENMEM_soft_reset operation
  - From: Vitaly Kuznetsov

Prev by Date: [Xen-devel] [RFC][v2][PATCH 14/14] xen/vtd: enable USB device assignment
Next by Date: Re: [Xen-devel] [PATCH v6 05/10] xsm: add XENMEM_soft_reset support
Previous by thread: [Xen-devel] [PATCH v6 04/10] xen: Introduce XENMEM_soft_reset operation
Next by thread: Re: [Xen-devel] [PATCH v6 04/10] xen: Introduce XENMEM_soft_reset operation
Index(es):
- Date
- Thread

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.