[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v2 REPOST 12/12] x86/hvm/ioreq: add a new mappable resource type...



> -----Original Message-----
> From: Roger Pau Monne
> Sent: 25 August 2017 10:32
> To: Paul Durrant <Paul.Durrant@xxxxxxxxxx>
> Cc: xen-devel@xxxxxxxxxxxxxxxxxxxx; Stefano Stabellini
> <sstabellini@xxxxxxxxxx>; Wei Liu <wei.liu2@xxxxxxxxxx>; George Dunlap
> <George.Dunlap@xxxxxxxxxx>; Andrew Cooper
> <Andrew.Cooper3@xxxxxxxxxx>; Ian Jackson <Ian.Jackson@xxxxxxxxxx>; Tim
> (Xen.org) <tim@xxxxxxx>; Jan Beulich <jbeulich@xxxxxxxx>
> Subject: Re: [Xen-devel] [PATCH v2 REPOST 12/12] x86/hvm/ioreq: add a
> new mappable resource type...
> 
> On Tue, Aug 22, 2017 at 03:51:06PM +0100, Paul Durrant wrote:
> > ... XENMEM_resource_ioreq_server
> >
> > This patch adds support for a new resource type that can be mapped using
> > the XENMEM_acquire_resource memory op.
> >
> > If an emulator makes use of this resource type then, instead of mapping
> > gfns, the IOREQ server will allocate pages from the heap. These pages
> > will never be present in the P2M of the guest at any point and so are
> > not vulnerable to any direct attack by the guest. They are only ever
> > accessible by Xen and any domain that has mapping privilege over the
> > guest (which may or may not be limited to the domain running the
> emulator).
> >
> > NOTE: Use of the new resource type is not compatible with use of
> >       XEN_DMOP_get_ioreq_server_info unless the XEN_DMOP_no_gfns
> flag is
> >       set.
> >
> > Signed-off-by: Paul Durrant <paul.durrant@xxxxxxxxxx>
> > ---
> > Cc: Jan Beulich <jbeulich@xxxxxxxx>
> > Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
> > Cc: George Dunlap <George.Dunlap@xxxxxxxxxxxxx>
> > Cc: Ian Jackson <ian.jackson@xxxxxxxxxxxxx>
> > Cc: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>
> > Cc: Stefano Stabellini <sstabellini@xxxxxxxxxx>
> > Cc: Tim Deegan <tim@xxxxxxx>
> > Cc: Wei Liu <wei.liu2@xxxxxxxxxx>
> > ---
> >  xen/arch/x86/hvm/ioreq.c        | 136
> ++++++++++++++++++++++++++++++++++++++++
> >  xen/arch/x86/mm.c               |  27 ++++++++
> >  xen/include/asm-x86/hvm/ioreq.h |   2 +
> >  xen/include/public/hvm/dm_op.h  |   4 ++
> >  xen/include/public/memory.h     |   3 +
> >  5 files changed, 172 insertions(+)
> >
> > diff --git a/xen/arch/x86/hvm/ioreq.c b/xen/arch/x86/hvm/ioreq.c
> > index 795c198f95..9e6838dab6 100644
> > --- a/xen/arch/x86/hvm/ioreq.c
> > +++ b/xen/arch/x86/hvm/ioreq.c
> > @@ -231,6 +231,15 @@ static int hvm_map_ioreq_gfn(struct
> hvm_ioreq_server *s, bool buf)
> >      struct hvm_ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq;
> >      int rc;
> >
> > +    if ( iorp->page )
> > +    {
> > +        /* Make sure the page has not been allocated */
> > +        if ( gfn_eq(iorp->gfn, INVALID_GFN) )
> > +            return -EPERM;
> > +
> > +        return 0;
> 
> EEXIST? (See comment below, which I think also applies here).
> 
> > +    }
> > +
> >      if ( d->is_dying )
> >          return -EINVAL;
> >
> > @@ -253,6 +262,60 @@ static int hvm_map_ioreq_gfn(struct
> hvm_ioreq_server *s, bool buf)
> >      return rc;
> >  }
> >
> > +static int hvm_alloc_ioreq_mfn(struct hvm_ioreq_server *s, bool buf)
> > +{
> > +    struct domain *currd = current->domain;
> > +    struct hvm_ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq;
> > +
> > +    if ( iorp->page )
> > +    {
> > +        /* Make sure the page has not been mapped */
> > +        if ( !gfn_eq(iorp->gfn, INVALID_GFN) )
> > +            return -EPERM;
> > +
> > +        return 0;
> 
> Shouldn't this return EEXIST? Page has already been allocated by a
> previous call AFAICT, and it seems like a possible error/misbehavior
> to try to do it twice.
> 

The checks are there to prevent a caller from trying to mix the legacy and new 
methods of mapping ioreq server pages so EPERM (i.e. 'operation not permitted') 
seems like the correct error. I agree that it's not obvious, at this inner 
level, that I do think this is right. I'm open to debate about this though.

> > +    }
> > +
> > +    /*
> > +     * Allocated IOREQ server pages are assigned to the emulating
> > +     * domain, not the target domain. This is because the emulator is
> > +     * likely to be destroyed after the target domain has been torn
> > +     * down, and we must use MEMF_no_refcount otherwise page
> allocation
> > +     * could fail if the emulating domain has already reached its
> > +     * maximum allocation.
> > +     */
> > +    iorp->page = alloc_domheap_page(currd, MEMF_no_refcount);
> 
> I don't really like the fact that the page is not accounted for any
> domain, but I can see the point in doing it like that (which you
> argument in the comment).
> 
> IIRC there where talks about tightening the accounting of memory
> pages, so that ideally everything would be accounted for in the memory
> assigned to the domain.
> 
> Just some random through, but could the toolstack set aside some
> memory pages (ie: not map them into the domain p2m), that could then
> be used by this? (not asking you to do this here)
> 
> And how many pages are we expecting to use for each domain? I assume
> the number will be quite low.
> 

Yes, I agree the use on MEMF_no_refcount is not ideal and you do highlight an 
issue: I don't think there is currently an upper limit on the number of ioreq 
servers so an emulating domain could exhaust memory using the new scheme. I'll 
need to introduce a limit to avoid that.

> > +    if ( !iorp->page )
> > +        return -ENOMEM;
> > +
> > +    get_page(iorp->page, currd);
> 
> Do you really need this get_page? (page is already assigned to currd).
> 

Hmm. I thought MEMF_no_refcount modified that behaviour, but I may have got 
confused with MEMF_no_owner.

> > +
> > +    iorp->va = __map_domain_page_global(iorp->page);
> > +    if ( !iorp->va )
> > +    {
> > +        put_page(iorp->page);
> > +        iorp->page = NULL;
> > +        return -ENOMEM;
> > +    }
> > +
> > +    clear_page(iorp->va);
> > +    return 0;
> > +}
> > +
> > +static void hvm_free_ioreq_mfn(struct hvm_ioreq_server *s, bool buf)
> > +{
> > +    struct hvm_ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq;
> > +
> > +    if ( !iorp->page )
> > +        return;
> > +
> > +    unmap_domain_page_global(iorp->va);
> > +    iorp->va = NULL;
> > +
> > +    put_page(iorp->page);
> > +    iorp->page = NULL;
> > +}
> > +
> >  bool is_ioreq_server_page(struct domain *d, const struct page_info
> *page)
> >  {
> >      const struct hvm_ioreq_server *s;
> > @@ -457,6 +520,27 @@ static void
> hvm_ioreq_server_unmap_pages(struct hvm_ioreq_server *s)
> >      hvm_unmap_ioreq_gfn(s, false);
> >  }
> >
> > +static int hvm_ioreq_server_alloc_pages(struct hvm_ioreq_server *s)
> > +{
> > +    int rc = -ENOMEM;
> > +
> > +    rc = hvm_alloc_ioreq_mfn(s, false);
> > +
> > +    if ( !rc && (s->bufioreq_handling != HVM_IOREQSRV_BUFIOREQ_OFF) )
> > +        rc = hvm_alloc_ioreq_mfn(s, true);
> > +
> > +    if ( rc )
> > +        hvm_free_ioreq_mfn(s, false);
> > +
> > +    return rc;
> > +}
> > +
> > +static void hvm_ioreq_server_free_pages(struct hvm_ioreq_server *s)
> > +{
> > +    hvm_free_ioreq_mfn(s, true);
> > +    hvm_free_ioreq_mfn(s, false);
> > +}
> > +
> >  static void hvm_ioreq_server_free_rangesets(struct hvm_ioreq_server
> *s)
> >  {
> >      unsigned int i;
> > @@ -583,7 +667,18 @@ static int hvm_ioreq_server_init(struct
> hvm_ioreq_server *s,
> >
> >   fail_add:
> >      hvm_ioreq_server_remove_all_vcpus(s);
> > +
> > +    /*
> > +     * NOTE: It is safe to call both hvm_ioreq_server_unmap_pages() and
> > +     *       hvm_ioreq_server_free_pages() in that order.
> > +     *       This is because the former will do nothing if the pages
> > +     *       are not mapped, leaving the page to be freed by the latter.
> > +     *       However if the pages are mapped then the former will set
> > +     *       the page_info pointer to NULL, meaning the latter will do
> > +     *       nothing.
> > +     */
> >      hvm_ioreq_server_unmap_pages(s);
> > +    hvm_ioreq_server_free_pages(s);
> >
> >      return rc;
> >  }
> > @@ -593,6 +688,7 @@ static void hvm_ioreq_server_deinit(struct
> hvm_ioreq_server *s)
> >      ASSERT(!s->enabled);
> >      hvm_ioreq_server_remove_all_vcpus(s);
> >      hvm_ioreq_server_unmap_pages(s);
> > +    hvm_ioreq_server_free_pages(s);
> >      hvm_ioreq_server_free_rangesets(s);
> >  }
> >
> > @@ -745,6 +841,9 @@ int hvm_get_ioreq_server_info(struct domain *d,
> ioservid_t id,
> >              rc = hvm_ioreq_server_map_pages(s);
> >              if ( rc )
> >                  break;
> > +
> > +            gdprintk(XENLOG_INFO, "d%d ioreq server %u using gfns\n",
> > +                     d->domain_id, s->id);
> >          }
> >
> >          if ( ioreq_gfn )
> > @@ -767,6 +866,43 @@ int hvm_get_ioreq_server_info(struct domain *d,
> ioservid_t id,
> >      return rc;
> >  }
> >
> > +mfn_t hvm_get_ioreq_server_frame(struct domain *d, ioservid_t id,
> > +                                 unsigned int idx)
> > +{
> > +    struct hvm_ioreq_server *s;
> > +    mfn_t mfn = INVALID_MFN;
> > +
> > +    spin_lock_recursive(&d->arch.hvm_domain.ioreq_server.lock);
> > +
> > +    list_for_each_entry ( s,
> > +                          &d->arch.hvm_domain.ioreq_server.list,
> > +                          list_entry )
> > +    {
> > +        int rc;
> > +
> > +        if ( s == d->arch.hvm_domain.default_ioreq_server )
> > +            continue;
> 
> s->is_default
> 
> > +
> > +        if ( s->id != id )
> > +            continue;
> > +
> > +        rc = hvm_ioreq_server_alloc_pages(s);
> > +        if ( rc )
> > +            break;
> > +
> > +        if ( idx == 0 )
> > +            mfn = _mfn(page_to_mfn(s->bufioreq.page));
> > +        else if ( idx == 1 )
> > +            mfn = _mfn(page_to_mfn(s->ioreq.page));
> > +
> > +        break;
> > +    }
> > +
> > +    spin_unlock_recursive(&d->arch.hvm_domain.ioreq_server.lock);
> > +
> > +    return mfn;
> > +}
> > +
> >  int hvm_map_io_range_to_ioreq_server(struct domain *d, ioservid_t id,
> >                                       uint32_t type, uint64_t start,
> >                                       uint64_t end)
> > diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
> > index 4e86f0a2ab..3e845af0e4 100644
> > --- a/xen/arch/x86/mm.c
> > +++ b/xen/arch/x86/mm.c
> > @@ -122,6 +122,7 @@
> >  #include <asm/fixmap.h>
> >  #include <asm/io_apic.h>
> >  #include <asm/pci.h>
> > +#include <asm/hvm/ioreq.h>
> >
> >  /* Mapping of the fixmap space needed early. */
> >  l1_pgentry_t __section(".bss.page_aligned") __aligned(PAGE_SIZE)
> > @@ -4744,6 +4745,27 @@ static int xenmem_acquire_grant_table(struct
> domain *d,
> >      return 0;
> >  }
> >
> > +static int xenmem_acquire_ioreq_server(struct domain *d,
> > +                                       unsigned int id,
> > +                                       unsigned long frame,
> > +                                       unsigned long nr_frames,
> > +                                       unsigned long mfn_list[])
> 
> mfn_t maybe?

No. The list is mapped in so use of typesafe variants is not appropriate.

> 
> > +{
> > +    unsigned int i;
> 
> Either nr_frames wants to be unsigned int, or this needs to be
> unsigned long.

Not sure which. I'll check.

> 
> > +
> > +    for ( i = 0; i < nr_frames; i++ )
> > +    {
> > +        mfn_t mfn = hvm_get_ioreq_server_frame(d, id, frame + i);
> 
> Here you use unsigned long as the last parameter to
> hvm_get_ioreq_server_frame, while the function takes an unsigned int.

Ok.

Thanks,

  Paul

> 
> Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.