[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v6 01/10] xen: vnuma topology and subop hypercalls



On Wed, Jul 23, 2014 at 10:06 AM, Jan Beulich <JBeulich@xxxxxxxx> wrote:
>>>> On 18.07.14 at 07:50, <ufimtseva@xxxxxxxxx> wrote:
>> +static int vnuma_alloc(struct vnuma_info **vnuma,
>> +                       unsigned int nr_vnodes,
>> +                       unsigned int nr_vcpus,
>> +                       unsigned int dist_size)
>> +{
>> +    struct vnuma_info *v;
>> +
>> +    if ( vnuma && *vnuma )
>> +        return -EINVAL;
>> +
>> +    v = *vnuma;
>> +    /*
>> +     * check if any of xmallocs exeeds PAGE_SIZE.
>> +     * If yes, consider it as an error for now.
>> +     */
>> +    if ( nr_vnodes > PAGE_SIZE / sizeof(nr_vnodes)       ||
>> +        nr_vcpus > PAGE_SIZE / sizeof(nr_vcpus)          ||
>> +        nr_vnodes > PAGE_SIZE / sizeof(struct vmemrange) ||
>> +        dist_size > PAGE_SIZE / sizeof(dist_size) )
>
Hi Jan,
Thank you for your review.

> Three of the four checks are rather bogus - the types of the
> variables just happen to match the types of the respective
> array elements. Best to switch all of them to sizeof(*v->...).
> Plus I'm not sure about the dist_size check - in its current shape
> it's redundant with the nr_vnodes one (and really the function
> parameter seems pointless, i.e. could be calculated here), and
> it's questionable whether limiting that table against PAGE_SIZE
> isn't too restrictive. Also indentation seems broken here.


I agree on distance table memory allocation limit.
The max vdistance (in current interface) table dimension will be
effectively 256 after nr_vnodes size check,
so the max vNUMA nodes. Thus vdistance table will need to allocate 2
pages of 4K size.
Will be that viewed as a potential candidate to a list of affected
hypercalls in XSA-77?

>
>> +static long vnuma_fallback(const struct domain *d,
>> +                          struct vnuma_info **vnuma)
>> +{
>> +    struct vnuma_info *v;
>> +    long ret;
>> +
>> +
>> +    /* Will not destroy vNUMA here, destroy before calling this. */
>> +    if ( vnuma && *vnuma )
>> +        return -EINVAL;
>> +
>> +    v = *vnuma;
>> +    ret = vnuma_alloc(&v, 1, d->max_vcpus, 1);
>> +    if ( ret )
>> +        return ret;
>> +
>> +    v->vmemrange[0].start = 0;
>> +    v->vmemrange[0].end = d->max_pages << PAGE_SHIFT;
>
> Didn't we settle on using inclusive ranges to avoid problems at the
> address space end? Or was that in the context of some other series
> (likely by someone else)?

Yes, I think it was from Arianna.
http://lists.xen.org/archives/html/xen-devel/2014-04/msg02641.html

>
>> +long vnuma_init(const struct xen_domctl_vnuma *u_vnuma,
>> +                const struct domain *d,
>> +                struct vnuma_info **dst)
>> +{
>> +    unsigned int dist_size, nr_vnodes = 0;
>> +    long ret;
>> +    struct vnuma_info *v = NULL;
>> +
>> +    ret = -EINVAL;
>> +
>> +    /* If vNUMA topology already set, just exit. */
>> +    if ( !u_vnuma || *dst )
>> +        return ret;
>> +
>> +    nr_vnodes = u_vnuma->nr_vnodes;
>> +
>> +    if ( nr_vnodes == 0 )
>> +        return ret;
>> +
>> +    if ( nr_vnodes > (UINT_MAX / nr_vnodes) )
>> +        return ret;
>> +
>> +    dist_size = nr_vnodes * nr_vnodes;
>> +
>> +    ret = vnuma_alloc(&v, nr_vnodes, d->max_vcpus, dist_size);
>> +    if ( ret )
>> +        return ret;
>> +
>> +    /* On failure, set only one vNUMA node and its success. */
>> +    ret = 0;
>
> Pointless - just use "return 0" below.
>
>> +
>> +    if ( copy_from_guest(v->vdistance, u_vnuma->vdistance, dist_size) )
>> +        goto vnuma_onenode;
>> +    if ( copy_from_guest(v->vmemrange, u_vnuma->vmemrange, nr_vnodes) )
>> +        goto vnuma_onenode;
>> +    if ( copy_from_guest(v->vcpu_to_vnode, u_vnuma->vcpu_to_vnode,
>> +        d->max_vcpus) )
>
> Indentation.
>
>> +        goto vnuma_onenode;
>> +    if ( copy_from_guest(v->vnode_to_pnode, u_vnuma->vnode_to_pnode,
>> +        nr_vnodes) )
>
> Again.
>
>> +        goto vnuma_onenode;
>> +
>> +    v->nr_vnodes = nr_vnodes;
>> +    *dst = v;
>> +
>> +    return ret;
>> +
>> +vnuma_onenode:
>
> Labels are to be indented by at least one space.
>
>> +    case XEN_DOMCTL_setvnumainfo:
>> +    {
>> +        struct vnuma_info *v = NULL;
>> +
>> +        ret = -EFAULT;
>> +        if ( guest_handle_is_null(op->u.vnuma.vdistance)     ||
>> +            guest_handle_is_null(op->u.vnuma.vmemrange)      ||
>> +            guest_handle_is_null(op->u.vnuma.vcpu_to_vnode)  ||
>> +            guest_handle_is_null(op->u.vnuma.vnode_to_pnode) )
>> +            return ret;
>> +
>> +        ret = -EINVAL;
>> +
>> +        ret = vnuma_init(&op->u.vnuma, d, &v);
>> +        if ( ret < 0 || v == NULL )
>
> So when v == NULL you return success (ret being 0)? That second
> check is either pointless (could become an ASSERT()) or needs proper
> handling.
>
>> +            break;
>> +
>> +        /* overwrite vnuma for domain */
>> +        if ( !d->vnuma )
>> +            vnuma_destroy(d->vnuma);
>> +
>> +        domain_lock(d);
>> +        d->vnuma = v;
>> +        domain_unlock(d);
>
> domain_lock()? What does this guard against? (We generally aim at
> removing uses of domain_lock() rather than adding new ones.)
>
>> +    case XENMEM_get_vnumainfo:
>> +    {
>> +        struct vnuma_topology_info topology;
>> +        struct domain *d;
>> +        unsigned int dom_vnodes = 0;
>
> Pointless initializer.
>
>> +
>> +        /*
>> +         * guest passes nr_vnodes and nr_vcpus thus
>> +         * we know how much memory guest has allocated.
>> +         */
>> +        if ( copy_from_guest(&topology, arg, 1) ||
>> +            guest_handle_is_null(topology.vmemrange.h) ||
>> +            guest_handle_is_null(topology.vdistance.h) ||
>> +            guest_handle_is_null(topology.vcpu_to_vnode.h) )
>> +            return -EFAULT;
>> +
>> +        if ( (d = rcu_lock_domain_by_any_id(topology.domid)) == NULL )
>> +            return -ESRCH;
>> +
>> +        rc = -EOPNOTSUPP;
>> +        if ( d->vnuma == NULL )
>> +            goto vnumainfo_out;
>> +
>> +        if ( d->vnuma->nr_vnodes == 0 )
>> +            goto vnumainfo_out;
>
> Can this second condition validly (other than due to a race) be true if
> the first one wasn't? (And of course there's synchronization missing
> here, to avoid the race.)

My idea of using pair domain_lock and rcu_lock_domain_by_any_id was to
avoid that race.
I used domain_lock in domctl hypercall when the pointer to vnuma of a
domain is being set.
XENMEM_get_vnumainfo reads the values and hold the reader lock of that domain.
As setting vnuma happens once on booting domain,  domain_lock seemed
to be ok here.
Would be a spinlock more appropriate here?

>
>> +
>> +        dom_vnodes = d->vnuma->nr_vnodes;
>> +
>> +        /*
>> +         * guest nr_cpus and nr_nodes may differ from domain vnuma config.
>> +         * Check here guest nr_nodes and nr_cpus to make sure we dont 
>> overflow.
>> +         */
>> +        rc = -ENOBUFS;
>> +        if ( topology.nr_vnodes < dom_vnodes ||
>> +            topology.nr_vcpus < d->max_vcpus )
>> +            goto vnumainfo_out;
>
> You ought to be passing back the needed values in the structure fields.

Understood.
>
>> +
>> +        rc = -EFAULT;
>> +
>> +        if ( copy_to_guest(topology.vmemrange.h, d->vnuma->vmemrange,
>> +                           dom_vnodes) != 0 )
>> +            goto vnumainfo_out;
>> +
>> +        if ( copy_to_guest(topology.vdistance.h, d->vnuma->vdistance,
>> +                           dom_vnodes * dom_vnodes) != 0 )
>> +            goto vnumainfo_out;
>> +
>> +        if ( copy_to_guest(topology.vcpu_to_vnode.h, 
>> d->vnuma->vcpu_to_vnode,
>> +                           d->max_vcpus) != 0 )
>> +            goto vnumainfo_out;
>> +
>> +        topology.nr_vnodes = dom_vnodes;
>
> And why not topology.nr_vcpus?

Yes, I missed that one.

>
>> +
>> +        if ( copy_to_guest(arg, &topology, 1) != 0 )
>
> __copy_to_guest() please.
>
> Jan



-- 
Elena

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.