[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Xen-devel] [PATCHv3 0/3] Implement per-cpu reader-writer locks
This patch series adds per-cpu reader-writer locks as a generic lock implementation and then converts the grant table and p2m rwlocks to use the percpu rwlocks, in order to improve multi-socket host performance. CPU profiling has revealed the rwlocks themselves suffer from severe cache line bouncing due to the cmpxchg operation used even when taking a read lock. Multiqueue paravirtualised I/O results in heavy contention of the grant table and p2m read locks of a specific domain and so I/O throughput is bottlenecked by the overhead of the cache line bouncing itself. Per-cpu read locks avoid lock cache line bouncing by using a per-cpu data area to record a CPU has taken the read lock. Correctness is enforced for the write lock by using a per lock barrier which forces the per-cpu read lock to revert to using a standard read lock. The write lock then polls all the percpu data area until active readers for the lock have exited. Removing the cache line bouncing on a multi-socket Haswell-EP system dramatically improves performance, with 16 vCPU network IO performance going from 15 gb/s to 64 gb/s! The host under test was fully utilising all 40 logical CPU's at 64 gb/s, so a bigger logical CPU host may see an even better IO improvement. Note: Benchmarking of the these performance improvements should be done with the non debug version of the hypervisor otherwise the map_domain_page spinlock is the main bottleneck. Changes in V3: - Add percpu rwlock owner for debug Xen builds - Validate percpu rwlock owner at runtime for debug Xen builds - Fix hard tab issues - Use percpu rwlock wrappers for grant table rwlock users - Add comments why rw_is_locked ASSERTS have been removed in grant table code Changes in V2: - Add Cover letter - Convert p2m rwlock to percpu rwlock - Improve percpu rwlock to safely handle simultaneously holding 2 or more locks - Move percpu rwlock barrier from global to per lock - Move write lock cpumask variable to a percpu variable - Add macros to help initialise and use percpu rwlocks - Updated IO benchmark results to cover revised locking implementation Malcolm Crossley (3): rwlock: Add per-cpu reader-writer lock infrastructure grant_table: convert grant table rwlock to percpu rwlock p2m: convert p2m rwlock to percpu rwlock xen/arch/arm/mm.c | 4 +- xen/arch/x86/mm.c | 4 +- xen/arch/x86/mm/mm-locks.h | 12 ++-- xen/arch/x86/mm/p2m.c | 1 + xen/common/grant_table.c | 126 +++++++++++++++++++++++------------------- xen/common/spinlock.c | 46 +++++++++++++++ xen/include/asm-arm/percpu.h | 5 ++ xen/include/asm-x86/mm.h | 2 +- xen/include/asm-x86/percpu.h | 6 ++ xen/include/xen/grant_table.h | 24 +++++++- xen/include/xen/percpu.h | 4 ++ xen/include/xen/spinlock.h | 115 ++++++++++++++++++++++++++++++++++++++ 12 files changed, 282 insertions(+), 67 deletions(-) -- 1.7.12.4 _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |