[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCHv3 0/3] Implement per-cpu reader-writer locks



I didn't spot the percpu rwlock owner ASSERT being the wrong way round.
Please review version 4 of the series.

Sorry for the noise.

On 17/12/15 12:52, Malcolm Crossley wrote:
> This patch series adds per-cpu reader-writer locks as a generic lock
> implementation and then converts the grant table and p2m rwlocks to
> use the percpu rwlocks, in order to improve multi-socket host performance.
> 
> CPU profiling has revealed the rwlocks themselves suffer from severe cache
> line bouncing due to the cmpxchg operation used even when taking a read lock.
> Multiqueue paravirtualised I/O results in heavy contention of the grant table
> and p2m read locks of a specific domain and so I/O throughput is bottlenecked
> by the overhead of the cache line bouncing itself.
> 
> Per-cpu read locks avoid lock cache line bouncing by using a per-cpu data
> area to record a CPU has taken the read lock. Correctness is enforced for the 
> write lock by using a per lock barrier which forces the per-cpu read lock 
> to revert to using a standard read lock. The write lock then polls all 
> the percpu data area until active readers for the lock have exited.
> 
> Removing the cache line bouncing on a multi-socket Haswell-EP system 
> dramatically improves performance, with 16 vCPU network IO performance going 
> from 15 gb/s to 64 gb/s! The host under test was fully utilising all 40 
> logical CPU's at 64 gb/s, so a bigger logical CPU host may see an even better
> IO improvement.
> 
> Note: Benchmarking of the these performance improvements should be done with 
> the non debug version of the hypervisor otherwise the map_domain_page spinlock
> is the main bottleneck.
> 
> Changes in V3:
> - Add percpu rwlock owner for debug Xen builds
> - Validate percpu rwlock owner at runtime for debug Xen builds
> - Fix hard tab issues
> - Use percpu rwlock wrappers for grant table rwlock users
> - Add comments why rw_is_locked ASSERTS have been removed in grant table code
> 
> Changes in V2:
> - Add Cover letter
> - Convert p2m rwlock to percpu rwlock
> - Improve percpu rwlock to safely handle simultaneously holding 2 or more 
>   locks 
> - Move percpu rwlock barrier from global to per lock
> - Move write lock cpumask variable to a percpu variable
> - Add macros to help initialise and use percpu rwlocks
> - Updated IO benchmark results to cover revised locking implementation
> 
> Malcolm Crossley (3):
>   rwlock: Add per-cpu reader-writer lock infrastructure
>   grant_table: convert grant table rwlock to percpu rwlock
>   p2m: convert p2m rwlock to percpu rwlock
> 
>  xen/arch/arm/mm.c             |   4 +-
>  xen/arch/x86/mm.c             |   4 +-
>  xen/arch/x86/mm/mm-locks.h    |  12 ++--
>  xen/arch/x86/mm/p2m.c         |   1 +
>  xen/common/grant_table.c      | 126 
> +++++++++++++++++++++++-------------------
>  xen/common/spinlock.c         |  46 +++++++++++++++
>  xen/include/asm-arm/percpu.h  |   5 ++
>  xen/include/asm-x86/mm.h      |   2 +-
>  xen/include/asm-x86/percpu.h  |   6 ++
>  xen/include/xen/grant_table.h |  24 +++++++-
>  xen/include/xen/percpu.h      |   4 ++
>  xen/include/xen/spinlock.h    | 115 ++++++++++++++++++++++++++++++++++++++
>  12 files changed, 282 insertions(+), 67 deletions(-)
> 


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.