[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v3 2/3] xen: implement guest_physmap_(un)pin_range



On Mon, 2013-08-05 at 17:40 +0100, Stefano Stabellini wrote:
> guest_physmap_pin_range pins a range of guest pages so that they p2m

s/they/their/

> mappings won't be changed.
> guest_physmap_unpin_range unpins the previously pinned pages.
> 
> The pinning is done using one of the spare bits in the p2m ptes.
> 
> Provide empty stubs for x86.
> 
> Signed-off-by: Stefano Stabellini <stefano.stabellini@xxxxxxxxxxxxx>
> ---
>  xen/arch/arm/p2m.c         |  131 
> ++++++++++++++++++++++++++++++++++++++++++++
>  xen/include/asm-arm/mm.h   |    4 +
>  xen/include/asm-arm/page.h |    6 ++-
>  xen/include/asm-x86/p2m.h  |   12 ++++
>  4 files changed, 152 insertions(+), 1 deletions(-)
> 
> diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
> index 307c6d4..7543ca8 100644
> --- a/xen/arch/arm/p2m.c
> +++ b/xen/arch/arm/p2m.c
> @@ -75,6 +75,129 @@ done:
>      return maddr;
>  }
>  
> +
> +int guest_physmap_pin_range(struct domain *d,
> +                            xen_pfn_t gpfn,
> +                            unsigned int order)
> +{
> +    lpae_t *first = NULL, *second = NULL, *third = NULL, pte;
> +    paddr_t addr = gpfn << PAGE_SHIFT;
> +    struct p2m_domain *p2m = &d->arch.p2m;
> +    int rc = -EFAULT, i;
> +    unsigned long cur_first_offset = ~0, cur_second_offset = ~0;
> +
> +    spin_lock(&p2m->lock);
> +
> +    first = __map_domain_page(p2m->first_level);
> +
> +    if ( !first ||
> +            !first[first_table_offset(addr)].p2m.valid ||
> +            !first[first_table_offset(addr)].p2m.table )
> +        goto err;
> +
> +    for ( i = 0; i < (1UL << order); i++ )
> +    {
> +        if ( cur_first_offset != first_table_offset(addr) )
> +        {
> +            if (second) unmap_domain_page(second);
> +            second = 
> map_domain_page(first[first_table_offset(addr)].p2m.base);
> +            cur_first_offset = first_table_offset(addr);
> +        }
> +        if ( !second ||
> +                !second[second_table_offset(addr)].p2m.valid ||
> +                !second[second_table_offset(addr)].p2m.table )
> +            goto err;
> +
> +        if ( cur_second_offset != second_table_offset(addr) )
> +        {
> +            if (third) unmap_domain_page(third);
> +            third = 
> map_domain_page(second[second_table_offset(addr)].p2m.base);
> +            cur_second_offset = second_table_offset(addr);
> +        }
> +        if ( !third ||
> +                !third[third_table_offset(addr)].p2m.valid ||
> +                third[third_table_offset(addr)].p2m.avail & P2M_DMA_PIN )

At this point rc == -EFAULT, which doesn't seem right when failing the
P2M_DMA_PIN check.

> +            goto err;
> +
> +        pte = third[third_table_offset(addr)];
> +        pte.p2m.avail |= P2M_DMA_PIN;
> +        write_pte(&third[third_table_offset(addr)], pte);
> +
> +        addr += PAGE_SIZE;
> +    }
> +
> +    rc = 0;
> +
> +err:

Leaks final mapping of third I think?

You add two p2m walkers in this patch, and we have at least one already.
Perhaps it is time for a generic function which calls a callback on the
leaf ptes?

> +    if ( second ) unmap_domain_page(second);
> +    if ( first ) unmap_domain_page(first);
> +
> +    spin_unlock(&p2m->lock);
> +
> +    return rc;
> +}
> +
> +int guest_physmap_unpin_range(struct domain *d,
> +                              xen_pfn_t gpfn,
> +                              unsigned int order)
> +{
> +    lpae_t *first = NULL, *second = NULL, *third = NULL, pte;
> +    paddr_t addr = gpfn << PAGE_SHIFT;
> +    struct p2m_domain *p2m = &d->arch.p2m;
> +    int rc = -EFAULT, i;
> +    unsigned long cur_first_offset = ~0, cur_second_offset = ~0;
> [...]

Same generic comments as the pin version.

> +    rc = 0;
> +
> +err:
> +    if ( second ) unmap_domain_page(second);
> +    if ( first ) unmap_domain_page(first);
> +
> +    spin_unlock(&p2m->lock);
> +
> +    return rc;
> +}
> +
>  int guest_physmap_mark_populate_on_demand(struct domain *d,
>                                            unsigned long gfn,
>                                            unsigned int order)
> @@ -184,6 +307,14 @@ static int create_p2m_entries(struct domain *d,
>              cur_second_offset = second_table_offset(addr);
>          }
>  
> +        if ( third[third_table_offset(addr)].p2m.avail & P2M_DMA_PIN )
> +        {
> +            rc = -EINVAL;
> +            printk("%s: cannot change p2m mapping for paddr=%"PRIpaddr
> +                    " domid=%d, the page is pinned\n", __func__, addr, 
> d->domain_id);

Not sure if a guest can trigger this, if yes then gdprintk.

Do we always clear this bit when we clear the present bit? If not then
we could end up with entries which are pinned and not-present. We should
probably avoid that one way or another!

> +            goto out;
> +        }
> +
>          flush = third[third_table_offset(addr)].p2m.valid;
>  
>          /* Allocate a new RAM page and attach */
> diff --git a/xen/include/asm-arm/mm.h b/xen/include/asm-arm/mm.h
> index 5e7c5a3..8db8816 100644
> --- a/xen/include/asm-arm/mm.h
> +++ b/xen/include/asm-arm/mm.h
> @@ -319,6 +319,10 @@ void free_init_memory(void);
>  
>  int guest_physmap_mark_populate_on_demand(struct domain *d, unsigned long 
> gfn,
>                                            unsigned int order);
> +int guest_physmap_pin_range(struct domain *d, xen_pfn_t gpfn,
> +                            unsigned int order);
> +int guest_physmap_unpin_range(struct domain *d, xen_pfn_t gpfn,
> +                              unsigned int order);
>  
>  extern void put_page_type(struct page_info *page);
>  static inline void put_page_and_type(struct page_info *page)
> diff --git a/xen/include/asm-arm/page.h b/xen/include/asm-arm/page.h
> index 41e9eff..12087d0 100644
> --- a/xen/include/asm-arm/page.h
> +++ b/xen/include/asm-arm/page.h
> @@ -153,11 +153,15 @@ typedef struct {
>      unsigned long hint:1;       /* In a block of 16 contiguous entries */
>      unsigned long sbz2:1;
>      unsigned long xn:1;         /* eXecute-Never */
> -    unsigned long avail:4;      /* Ignored by hardware */
> +    unsigned long avail:4;      /* Ignored by hardware, see below */
>  
>      unsigned long sbz1:5;
>  } __attribute__((__packed__)) lpae_p2m_t;
>  
> +/* Xen "avail" bits allocation in third level entries */
> +#define P2M_DMA_PIN     (1<<0)  /* The page has been "pinned": the hypervisor
> +                                   promises not to change the p2m mapping. */

I guess we are going to want to keep type info here eventually, this
leaves 3 bits for that.

For reference x86 has the following p2m types :
    p2m_ram_rw = 0,             /* Normal read/write guest RAM */
    p2m_invalid = 1,            /* Nothing mapped here */
    p2m_ram_logdirty = 2,       /* Temporarily read-only for log-dirty */
    p2m_ram_ro = 3,             /* Read-only; writes are silently dropped */
    p2m_mmio_dm = 4,            /* Reads and write go to the device model */
    p2m_mmio_direct = 5,        /* Read/write mapping of genuine MMIO area */
    p2m_populate_on_demand = 6, /* Place-holder for empty memory */

    /* Although these are defined in all builds, they can only
     * be used in 64-bit builds */
    p2m_grant_map_rw = 7,         /* Read/write grant mapping */
    p2m_grant_map_ro = 8,         /* Read-only grant mapping */
    p2m_ram_paging_out = 9,       /* Memory that is being paged out */
    p2m_ram_paged = 10,           /* Memory that has been paged out */
    p2m_ram_paging_in = 11,       /* Memory that is being paged in */
    p2m_ram_shared = 12,          /* Shared or sharable memory */
    p2m_ram_broken = 13,          /* Broken page, access cause domain crash */

We can get away with not having some of these for now, but not
necessarily in the long term I think. 4 available bits is really too
few :-(

I'm unsure if pinning is orthogonal to type. I think it probably is? Or
can we say that only normal ram can be pinned? Actually that may make
sense since many of the other types are either implicitly pinned already
(grant map, mmio etc) or pinning makes no sense (broken) or pinning
would have to change the type anyway (pin shared -> unshare -> ram_rw
+pinned, likewise for the paging types).

So the pinable types are p2m_ram_rw, p2m_ram_ro, p2m_ram_logdirty? We
could perhaps declare that ro and logdirty are implicitly pinned too?
For log dirty it's pretty obvious, not so sure about ro -- doing DMA
even just from r/o memory seems a bit dodgy to allow.

Ian.


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.