[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v7 7/9] ioreq-server: add support for multiple servers



> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@xxxxxxxx]
> Sent: 12 May 2014 04:26
> To: Paul Durrant
> Cc: Ian Campbell; Ian Jackson; Stefano Stabellini; xen-devel@xxxxxxxxxxxxx
> Subject: Re: [PATCH v7 7/9] ioreq-server: add support for multiple servers
> 
> >>> On 09.05.14 at 10:40, <paul.durrant@xxxxxxxxxx> wrote:
> > +static void hvm_free_ioreq_gmfn(struct domain *d, unsigned long gmfn)
> > +{
> > +    unsigned int i = gmfn - d->arch.hvm_domain.ioreq_gmfn.base;
> > +
> > +    clear_bit(i, &d->arch.hvm_domain.ioreq_gmfn.mask);
> 
> Perhaps worth an ASSERT() on i not being out of range, to avoid
> hard to debug memory corruption?
>

Ok.
 
> > +static int hvm_access_cf8(
> > +    int dir, uint32_t port, uint32_t bytes, uint32_t *val)
> > +{
> > +    struct domain *d = current->domain;
> > +
> > +    if ( bytes != 4 )
> > +        return X86EMUL_UNHANDLEABLE;
> > +
> > +    BUG_ON(port < 0xcf8);
> 
> If at all, that would better be an ASSERT() imo.
> 

Ok.

> > +    port -= 0xcf8;
> 
> Stale statement - port isn't being used anymore below.
> 

Ok - thanks for spotting that.
> > +
> > +    if ( dir == IOREQ_WRITE )
> > +    {
> > +        d->arch.hvm_domain.pci_cf8 = *val;
> > +
> > +        /* We always need to fall through to the catch all emulator */
> > +        return X86EMUL_UNHANDLEABLE;
> > +    }
> > +    else
> > +    {
> > +        *val = d->arch.hvm_domain.pci_cf8;
> > +        return X86EMUL_OKAY;
> 
> Is this correct irrespective of the value of the high bit? You don't
> really know what non-config-space accesses mean (including whether
> that would have any side effects), so I guess you'd be better of
> forwarding that case too.
> 

True - there could be side effects. I'll forward if the top bit is set.

> And in the end I wonder whether it wouldn't be better to forward
> all access, with this function solely serving the purpose of latching
> the last written value.
> 

Yes, I guess I could do that since the write is being forwarded.

> > -static int hvm_ioreq_server_map_pages(struct hvm_ioreq_server *s)
> > +static int hvm_ioreq_server_map_pages(struct hvm_ioreq_server *s,
> > +                                      bool_t is_default)
> >  {
> >      struct domain *d = s->domain;
> > -    unsigned long pfn;
> > +    unsigned long ioreq_pfn, bufioreq_pfn;
> >      int rc;
> >
> > -    pfn = d->arch.hvm_domain.params[HVM_PARAM_IOREQ_PFN];
> > -    rc = hvm_map_ioreq_page(s, 0, pfn);
> > +    if ( is_default ) {
> > +        ioreq_pfn = d-
> >arch.hvm_domain.params[HVM_PARAM_IOREQ_PFN];
> > +        bufioreq_pfn = d-
> >arch.hvm_domain.params[HVM_PARAM_BUFIOREQ_PFN];
> > +    } else {
> 
> I hate to say that, but: Coding style. And still more of this further
> down.
> 

Aaarggh. I keep checking for these but I'm just not seeing them.

> > +static int hvm_ioreq_server_alloc_rangesets(struct hvm_ioreq_server *s,
> > +                                            bool_t is_default)
> > +{
> > +    int i;
> 
> And for variables like this I also already asked that they be unsigned
> (as properly done e.g. in hvm_ioreq_server_free_rangesets()).
> 

Ok.

> > +    int rc;
> > +
> > +    if ( is_default )
> > +        goto done;
> > +
> > +    for ( i = 0; i < NR_IO_RANGE_TYPES; i++ ) {
> > +        char *name;
> > +
> > +        rc = asprintf(&name, "ioreq_server %d %s", s->id,
> > +                      (i == HVMOP_IO_RANGE_PORT) ? "port" :
> > +                      (i == HVMOP_IO_RANGE_MEMORY) ? "memory" :
> > +                      (i == HVMOP_IO_RANGE_PCI) ? "pci" :
> > +                      "");
> > +        if ( rc )
> > +            goto fail;
> > +
> > +        s->range[i] = rangeset_new(s->domain, name,
> > +                                   RANGESETF_prettyprint_hex);
> > +
> > +        xfree(name);
> > +
> > +        rc = -ENOMEM;
> > +        if ( !s->range[i] )
> > +            goto fail;
> > +
> > +        rangeset_limit(s->range[i], MAX_NR_IO_RANGES);
> 
> Just as a remark/question: Perhaps ultimately this limit should
> become a configurable one, at once allowing the limits for the three
> kinds to be different?
> 

Yes, I guess it's likely the limit on PCI ranges could be lower than the 
others. It's really just a safety net to stop a malicious emulator from 
stealing all of Xen's memory though.

> > +static ioservid_t next_ioservid(struct domain *d)
> > +{
> > +    struct hvm_ioreq_server *s;
> > +    static ioservid_t id;
> > +
> > +    ASSERT(spin_is_locked(&d->arch.hvm_domain.ioreq_server.lock));
> > +
> > +    id++;
> > +
> > + again:
> > +    /* Check for uniqueness */
> > +    list_for_each_entry ( s,
> > +                          &d->arch.hvm_domain.ioreq_server.list,
> > +                          list_entry )
> > +    {
> > +        if (id == s->id)
> > +        {
> > +            id++;
> > +            goto again;
> 
> Please put the label ahead of the pre-loop increment, avoiding the
> duplication.
> 
> And what's worse - without latching the value of "id" into a local
> variable (storing back only the final result), your uniqueness check
> doesn't work, because the lock only guards against concurrent
> accesses for a single domain. Once fixed, perhaps also add a brief
> comment explaining why the remaining not fully serialized access to
> "id" is not a problem?
> 

Good point. Multiple emulating domains would fight here.

> And of course, if you're going to stay with this system-wide static
> variable, you'd need to clarify why you think this is not a - how
> ever small - information leak (of the side channel kind).
> 

Yes - if I move to a per-domain value then the lock also does its job so I'll 
do that.

> > @@ -757,25 +942,35 @@ static int hvm_create_ioreq_server(struct domain
> *d,
> > domid_t domid)
> >          goto fail1;
> >
> >      domain_pause(d);
> > -    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
> > +    spin_lock(&d->arch.hvm_domain.ioreq_server.lock);
> >
> >      rc = -EEXIST;
> > -    if ( d->arch.hvm_domain.ioreq_server != NULL )
> > +    if ( is_default && d->arch.hvm_domain.default_ioreq_server != NULL )
> >          goto fail2;
> >
> > -    rc = hvm_ioreq_server_init(s, d, domid);
> > +    rc = hvm_ioreq_server_init(s, d, domid, is_default,
> > +                               next_ioservid(d));
> >      if ( rc )
> > -        goto fail2;
> > +        goto fail3;
> > +
> > +    list_add(&s->list_entry,
> > +             &d->arch.hvm_domain.ioreq_server.list);
> > +    d->arch.hvm_domain.ioreq_server.count++;
> 
> Do you really need that field? The only time it's being read seems to be
> to determine whether there's exactly one of them. That surely can also
> be determined by looking at .list.
>

Yes - I think I must have used for something else at some point. I'll get rid 
of it.
 
> > +static int hvm_map_io_range_to_ioreq_server(struct domain *d,
> ioservid_t id,
> > +                                            uint32_t type, uint64_t start, 
> > uint64_t end)
> > +{
> > +    struct hvm_ioreq_server *s;
> > +    int rc;
> > +
> > +    spin_lock(&d->arch.hvm_domain.ioreq_server.lock);
> > +
> > +    rc = -ENOENT;
> > +    list_for_each_entry ( s,
> > +                          &d->arch.hvm_domain.ioreq_server.list,
> > +                          list_entry )
> > +    {
> > +        if ( s == d->arch.hvm_domain.default_ioreq_server )
> > +            continue;
> > +
> > +        if ( s->id == id )
> > +        {
> > +            struct rangeset *r;
> > +
> > +            switch ( type )
> > +            {
> > +            case HVMOP_IO_RANGE_PORT:
> > +            case HVMOP_IO_RANGE_MEMORY:
> > +            case HVMOP_IO_RANGE_PCI:
> > +                r = s->range[type];
> > +                break;
> > +
> > +            default:
> > +                r = NULL;
> > +                break;
> > +            }
> > +
> > +            rc = -EINVAL;
> > +            if ( !r )
> > +                break;
> > +
> > +            rc = rangeset_add_range(r, start, end);
> 
> Is it not an error for the range to perhaps already be there ...
> 

Hmm. I was trusting the rangeset code to fail in that case; I'll check what it 
actually does.

> > +static int hvm_unmap_io_range_from_ioreq_server(struct domain *d,
> ioservid_t id,
> > +                                                uint32_t type, uint64_t 
> > start, uint64_t end)
> > +{
> > +    struct hvm_ioreq_server *s;
> > +    int rc;
> > +
> > +    spin_lock(&d->arch.hvm_domain.ioreq_server.lock);
> > +
> > +    rc = -ENOENT;
> > +    list_for_each_entry ( s,
> > +                          &d->arch.hvm_domain.ioreq_server.list,
> > +                          list_entry )
> > +    {
> > +        if ( s == d->arch.hvm_domain.default_ioreq_server )
> > +            continue;
> > +
> > +        if ( s->id == id )
> > +        {
> > +            struct rangeset *r;
> > +
> > +            switch ( type )
> > +            {
> > +            case HVMOP_IO_RANGE_PORT:
> > +            case HVMOP_IO_RANGE_MEMORY:
> > +            case HVMOP_IO_RANGE_PCI:
> > +                r = s->range[type];
> > +                break;
> > +
> > +            default:
> > +                r = NULL;
> > +                break;
> > +            }
> > +
> > +            rc = -EINVAL;
> > +            if ( !r )
> > +                break;
> > +
> > +            rc = rangeset_remove_range(r, start, end);
> 
> ... or to not be there here?
> 

Again. I was expecting the rangeset code to fail. I'll check.

> > +static int hvm_all_ioreq_servers_add_vcpu(struct domain *d, struct vcpu
> *v)
> > +{
> > +    struct hvm_ioreq_server *s;
> > +    int rc;
> > +
> > +    spin_lock(&d->arch.hvm_domain.ioreq_server.lock);
> > +
> > +    list_for_each_entry ( s,
> > +                          &d->arch.hvm_domain.ioreq_server.list,
> > +                          list_entry )
> > +    {
> > +        bool_t is_default = ( s == d-
> >arch.hvm_domain.default_ioreq_server);
> 
> Stray blank after (, also again further down.
> 

Ok. Must have coded as an if and then changed my mind.

> > +static void hvm_destroy_all_ioreq_servers(struct domain *d)
> > +{
> > +    struct hvm_ioreq_server *s, *next;
> > +
> > +    spin_lock(&d->arch.hvm_domain.ioreq_server.lock);
> > +
> > +    list_for_each_entry_safe ( s,
> > +                               next,
> > +                               &d->arch.hvm_domain.ioreq_server.list,
> > +                               list_entry )
> > +    {
> > +        bool_t is_default = ( s == d-
> >arch.hvm_domain.default_ioreq_server);
> > +
> > +        domain_pause(d);
> 
> This gets called only during domain teardown - what's the point of
> the domain_pause()?
> 

True. Too much cut'n'paste from the single server case.

> > +static struct hvm_ioreq_server *hvm_select_ioreq_server(struct domain
> *d,
> > +                                                        ioreq_t *p)
> > +{
> > +#define CF8_BDF(cf8)     (((cf8) & 0x00ffff00) >> 8)
> > +#define CF8_ADDR_LO(cf8) ((cf8) & 0x000000fc)
> > +#define CF8_ADDR_HI(cf8) (((cf8) & 0x0f000000) >> 16)
> 
> Any reason you exclude the low two bits here, despite separately
> or-ing them back in below?
> 

A couple of documents I've read suggest that the 2 low order bits of cf8 are 
reserved, so I mask them here. The bits I OR in lower down are from the data 
cycle rather than the address cycle.

> > +#define CF8_ENABLED(cf8) (!!((cf8) & 0x80000000))
> > +
> > +    struct hvm_ioreq_server *s;
> > +    uint32_t cf8;
> > +    uint8_t type;
> > +    uint64_t addr;
> > +
> > +    if ( d->arch.hvm_domain.ioreq_server.count <= 1 ||
> > +         (p->type != IOREQ_TYPE_COPY && p->type != IOREQ_TYPE_PIO) )
> > +        return d->arch.hvm_domain.default_ioreq_server;
> > +
> > +    cf8 = d->arch.hvm_domain.pci_cf8;
> > +
> > +    if ( p->type == IOREQ_TYPE_PIO &&
> > +         (p->addr & ~3) == 0xcfc &&
> > +         CF8_ENABLED(cf8) )
> > +    {
> > +        uint32_t sbdf;
> > +
> > +        /* PCI config data cycle */
> > +
> > +        sbdf = HVMOP_PCI_SBDF(0,
> > +                              PCI_BUS(CF8_BDF(cf8)),
> > +                              PCI_SLOT(CF8_BDF(cf8)),
> > +                              PCI_FUNC(CF8_BDF(cf8)));
> > +
> > +        type = IOREQ_TYPE_PCI_CONFIG;
> > +        addr = ((uint64_t)sbdf << 32) |
> > +               CF8_ADDR_HI(cf8) |
> > +               CF8_ADDR_LO(cf8) |
> > +               (p->addr & 3);
> > +    }
> > +    else
> > +    {
> > +        type = p->type;
> 
> I know I asked this before, and iirc your answer was "no": Is
> IOREQ_TYPE_PCI_CONFIG coming in here valid?
> 

No. Callers of this function should never pass that type in - it should only 
come back out.

> > +        addr = p->addr;
> > +    }
> > +
> > +    list_for_each_entry ( s,
> > +                          &d->arch.hvm_domain.ioreq_server.list,
> > +                          list_entry )
> > +    {
> > +        struct rangeset *r;
> > +
> > +        if ( s == d->arch.hvm_domain.default_ioreq_server )
> > +            continue;
> > +
> > +        BUILD_BUG_ON(IOREQ_TYPE_PIO != HVMOP_IO_RANGE_PORT);
> > +        BUILD_BUG_ON(IOREQ_TYPE_COPY !=
> HVMOP_IO_RANGE_MEMORY);
> > +        BUILD_BUG_ON(IOREQ_TYPE_PCI_CONFIG !=
> HVMOP_IO_RANGE_PCI);
> > +        r = s->range[type];
> > +
> > +        switch ( type )
> > +        {
> > +        case IOREQ_TYPE_PIO:
> > +        case IOREQ_TYPE_COPY:
> > +            if ( rangeset_contains_singleton(r, addr) )
> 
> While the PCI check below certainly can be a singleton one, I don't
> think you can do so here: In order for a request to be forwarded,
> the full spanned range should be owned by the respective server.
> And yes, consideration how to deal with split accesses is going to be
> needed I'm afraid.

Yes - that must have been lost with the change to using rangesets. I'm going to 
punt on split ranges though, as it's a whole other layer of complexity. I'll 
add a comment somewhere to say that unless a cycle falls fully within the range 
of a secondary emulator it will not be passed to it. That's certainly good 
enough for what I'm using this for at the moment, and support for split ranges 
could be added later if needed.

> 
> > +static int hvmop_create_ioreq_server(
> > +    XEN_GUEST_HANDLE_PARAM(xen_hvm_create_ioreq_server_t) uop)
> > +{
> > +    struct domain *curr_d = current->domain;
> > +    xen_hvm_create_ioreq_server_t op;
> > +    struct domain *d;
> > +    int rc;
> > +
> > +    if ( copy_from_guest(&op, uop, 1) )
> > +        return -EFAULT;
> > +
> > +    rc = rcu_lock_remote_domain_by_id(op.domid, &d);
> > +    if ( rc != 0 )
> > +        return rc;
> > +
> > +    rc = -EINVAL;
> > +    if ( !is_hvm_domain(d) )
> > +        goto out;
> > +
> > +    rc = xsm_hvm_ioreq_server(XSM_DM_PRIV, d);
> 
> This call is indistinguishable from the four other ones further down.
> You ought to let XSM know what operation it really is that you're
> about to perform.
> 

Ok, I'll differentiate the types of operation if you think a finer grained 
level of access control is required.

  Paul

> Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.