[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH 2/2] grant_table: convert grant table rwlock to percpu rwlock
On 18/11/15 20:02, Konrad Rzeszutek Wilk wrote: > On Tue, Nov 17, 2015 at 05:30:59PM +0000, Andrew Cooper wrote: >> On 17/11/15 17:04, Jan Beulich wrote: >>>>>> On 03.11.15 at 18:58, <malcolm.crossley@xxxxxxxxxx> wrote: >>>> --- a/xen/common/grant_table.c >>>> +++ b/xen/common/grant_table.c >>>> @@ -178,6 +178,10 @@ struct active_grant_entry { >>>> #define _active_entry(t, e) \ >>>> ((t)->active[(e)/ACGNT_PER_PAGE][(e)%ACGNT_PER_PAGE]) >>>> >>>> +bool_t grant_rwlock_barrier; >>>> + >>>> +DEFINE_PER_CPU(rwlock_t *, grant_rwlock); >>> Shouldn't these be per grant table? And wouldn't doing so eliminate >>> the main limitation of the per-CPU rwlocks? >> >> The grant rwlock is per grant table. >> >> The entire point of this series is to reduce the cmpxchg storm which >> happens when many pcpus attempt to grap the same domains grant read lock. >> >> As identified in the commit message, reducing the cmpxchg pressure on >> the cache coherency fabric increases intra-vm network through from >> 10Gbps to 50Gbps when running iperf between two 16-vcpu guests. >> >> Or in other words, 80% of cpu time is wasted with waiting on an atomic >> read/modify/write operation against a remote hot cache line. >> > > Why not use MCE locks then (in Linux the implemention is known > as qspinlock). Plus they have added extra code to protect against > recursion (via four levels). See Linux commit > a33fda35e3a7655fb7df756ed67822afb5ed5e8d > locking/qspinlock: Introduce a simple generic 4-byte queued spinlock) > The Linux qspinlock is MCS based but MCS only helps under lock contention. It still uses a single data location for the lock and so suffers from cache line bouncing plus the cmpxchg overhead for taking a uncontended lock. You can see the qspinlock using the cmpxchg mechanism here: http://lxr.free-electrons.com/source/include/asm-generic/qspinlock.h#L62 I've copy pasted the qspinlock lock implementation inline for convenience: static __always_inline void queued_spin_lock(struct qspinlock *lock) { u32 val; val = atomic_cmpxchg(&lock->val, 0, _Q_LOCKED_VAL); if (likely(val == 0)) return; queued_spin_lock_slowpath(lock, val); } Malcolm >> ~Andrew >> >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@xxxxxxxxxxxxx >> http://lists.xen.org/xen-devel _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |