[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH 2/2] grant_table: convert grant table rwlock to percpu rwlock
>>> On 18.11.15 at 11:06, <andrew.cooper3@xxxxxxxxxx> wrote: > On 18/11/15 07:45, Jan Beulich wrote: >>>>> On 17.11.15 at 18:53, <andrew.cooper3@xxxxxxxxxx> wrote: >>> On 17/11/15 17:39, Jan Beulich wrote: >>>>>>> On 17.11.15 at 18:30, <andrew.cooper3@xxxxxxxxxx> wrote: >>>>> On 17/11/15 17:04, Jan Beulich wrote: >>>>>>>>> On 03.11.15 at 18:58, <malcolm.crossley@xxxxxxxxxx> wrote: >>>>>>> --- a/xen/common/grant_table.c >>>>>>> +++ b/xen/common/grant_table.c >>>>>>> @@ -178,6 +178,10 @@ struct active_grant_entry { >>>>>>> #define _active_entry(t, e) \ >>>>>>> ((t)->active[(e)/ACGNT_PER_PAGE][(e)%ACGNT_PER_PAGE]) >>>>>>> >>>>>>> +bool_t grant_rwlock_barrier; >>>>>>> + >>>>>>> +DEFINE_PER_CPU(rwlock_t *, grant_rwlock); >>>>>> Shouldn't these be per grant table? And wouldn't doing so eliminate >>>>>> the main limitation of the per-CPU rwlocks? >>>>> The grant rwlock is per grant table. >>>> That's understood, but I don't see why the above items aren't, too. >>> Ah - because there is never any circumstance where two grant tables are >>> locked on the same pcpu. >> double_gt_lock()? Those are write locks, and iiuc the limitation is just >> on read locks, but anway. > > Those are grant entry locks, not the grant table lock. The grant table > write lock is only taken to switch grant table version. static inline void double_gt_lock(struct grant_table *lgt, struct grant_table *rgt) { /* * See mapkind() for why the write lock is also required for the * remote domain. */ if ( lgt < rgt ) { write_lock(&lgt->lock); write_lock(&rgt->lock); } else { if ( lgt != rgt ) write_lock(&rgt->lock); write_lock(&lgt->lock); } } looks very much like taking two grant table locks to me. >>> Nor is there any such need. >> I'm sorry, but no. If the lock flavor is to be tailored to gnttab use, >> then it shouldn't be added outside of grant_table.c. > > This mechanism is not specific to the grant table, nor are the grant > tables the only lock intended to be converted in this way. Doing this > to the p2m lock also gives decent improvements as well, although is > rather more fiddly to get right. And there are no nested uses of p2m locks? I recall a PVH issue needing special tweaking because nested locking could occur for both a PVH Dom0's lock and a DomU's controlled by it. As long as there's not going to be percpu_rwlock_t, with all pieces grouped together, this construct seems too special purpose to me. >>>>> The entire point of this series is to reduce the cmpxchg storm which >>>>> happens when many pcpus attempt to grap the same domains grant read lock. >>>>> >>>>> As identified in the commit message, reducing the cmpxchg pressure on >>>>> the cache coherency fabric increases intra-vm network through from >>>>> 10Gbps to 50Gbps when running iperf between two 16-vcpu guests. >>>>> >>>>> Or in other words, 80% of cpu time is wasted with waiting on an atomic >>>>> read/modify/write operation against a remote hot cache line. >>>> All of this is pretty nice, but again unrelated to the question I >>>> raised. >>>> >>>> The whole interface would likely become quite a bit easier to use >>>> if there was a percpu_rwlock_t comprising all three elements (the >>>> per-CPU item obviously would need to become a per-CPU pointer, >>>> with allocation of per-CPU data needing introduction). >>> Runtime per-CPU data allocation is incompatible with our current scheme >>> (which relies on the linker to do some of the heavy lifting). >> Well, there are ways to deal with that. One would be for components >> to declare how much they might need to use in the worst case (along >> the lines of x86 Linux'es .brk mechanism). Another (a variant thereof, >> much easier to implement right away) would be for grant_table.c to >> simply create a per-CPU array with DOMID_FIRST_RESERVED entries, >> using the one corresponding to the domain in question. And of course >> a scheme not as advanced as current Linux'es might do too - after all >> it did for Linux for many years (before the current one got invented). > > Or better yet, not use dynamic allocation. Which is what the middle of the three suggestions represents... > Coming up with a new per-cpu scheme is all fine and well if someone > wants to do so, but it is a substantial task and not necessary here. ... to keep things simple for now. Jan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |