[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 2/2] grant_table: convert grant table rwlock to percpu rwlock



>>> On 18.11.15 at 11:06, <andrew.cooper3@xxxxxxxxxx> wrote:
> On 18/11/15 07:45, Jan Beulich wrote:
>>>>> On 17.11.15 at 18:53, <andrew.cooper3@xxxxxxxxxx> wrote:
>>> On 17/11/15 17:39, Jan Beulich wrote:
>>>>>>> On 17.11.15 at 18:30, <andrew.cooper3@xxxxxxxxxx> wrote:
>>>>> On 17/11/15 17:04, Jan Beulich wrote:
>>>>>>>>> On 03.11.15 at 18:58, <malcolm.crossley@xxxxxxxxxx> wrote:
>>>>>>> --- a/xen/common/grant_table.c
>>>>>>> +++ b/xen/common/grant_table.c
>>>>>>> @@ -178,6 +178,10 @@ struct active_grant_entry {
>>>>>>>  #define _active_entry(t, e) \
>>>>>>>      ((t)->active[(e)/ACGNT_PER_PAGE][(e)%ACGNT_PER_PAGE])
>>>>>>>  
>>>>>>> +bool_t grant_rwlock_barrier;
>>>>>>> +
>>>>>>> +DEFINE_PER_CPU(rwlock_t *, grant_rwlock);
>>>>>> Shouldn't these be per grant table? And wouldn't doing so eliminate
>>>>>> the main limitation of the per-CPU rwlocks?
>>>>> The grant rwlock is per grant table.
>>>> That's understood, but I don't see why the above items aren't, too.
>>> Ah - because there is never any circumstance where two grant tables are
>>> locked on the same pcpu.
>> double_gt_lock()? Those are write locks, and iiuc the limitation is just
>> on read locks, but anway.
> 
> Those are grant entry locks, not the grant table lock.  The grant table
> write lock is only taken to switch grant table version.

static inline void
double_gt_lock(struct grant_table *lgt, struct grant_table *rgt)
{
    /*
     * See mapkind() for why the write lock is also required for the
     * remote domain.
     */
    if ( lgt < rgt )
    {
        write_lock(&lgt->lock);
        write_lock(&rgt->lock);
    }
    else
    {
        if ( lgt != rgt )
            write_lock(&rgt->lock);
        write_lock(&lgt->lock);
    }
}

looks very much like taking two grant table locks to me.

>>> Nor is there any such need.
>> I'm sorry, but no. If the lock flavor is to be tailored to gnttab use,
>> then it shouldn't be added outside of grant_table.c.
> 
> This mechanism is not specific to the grant table, nor are the grant
> tables the only lock intended to be converted in this way.  Doing this
> to the p2m lock also gives decent improvements as well, although is
> rather more fiddly to get right.

And there are no nested uses of p2m locks? I recall a PVH issue
needing special tweaking because nested locking could occur for
both a PVH Dom0's lock and a DomU's controlled by it.

As long as there's not going to be percpu_rwlock_t, with all pieces
grouped together, this construct seems too special purpose to me.

>>>>> The entire point of this series is to reduce the cmpxchg storm which
>>>>> happens when many pcpus attempt to grap the same domains grant read lock.
>>>>>
>>>>> As identified in the commit message, reducing the cmpxchg pressure on
>>>>> the cache coherency fabric increases intra-vm network through from
>>>>> 10Gbps to 50Gbps when running iperf between two 16-vcpu guests.
>>>>>
>>>>> Or in other words, 80% of cpu time is wasted with waiting on an atomic
>>>>> read/modify/write operation against a remote hot cache line.
>>>> All of this is pretty nice, but again unrelated to the question I
>>>> raised.
>>>>
>>>> The whole interface would likely become quite a bit easier to use
>>>> if there was a percpu_rwlock_t comprising all three elements (the
>>>> per-CPU item obviously would need to become a per-CPU pointer,
>>>> with allocation of per-CPU data needing introduction).
>>> Runtime per-CPU data allocation is incompatible with our current scheme
>>> (which relies on the linker to do some of the heavy lifting).
>> Well, there are ways to deal with that. One would be for components
>> to declare how much they might need to use in the worst case (along
>> the lines of x86 Linux'es .brk mechanism). Another (a variant thereof,
>> much easier to implement right away) would be for grant_table.c to
>> simply create a per-CPU array with DOMID_FIRST_RESERVED entries,
>> using the one corresponding to the domain in question. And of course
>> a scheme not as advanced as current Linux'es might do too - after all
>> it did for Linux for many years (before the current one got invented).
> 
> Or better yet, not use dynamic allocation.

Which is what the middle of the three suggestions represents...

> Coming up with a new per-cpu scheme is all fine and well if someone
> wants to do so, but it is a substantial task and not necessary here.

... to keep things simple for now.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.