[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCHv2 0/3] Implement per-cpu reader-writer locks
On 20/11/15 16:03, Malcolm Crossley wrote: > This patch series adds per-cpu reader-writer locks as a generic lock > implementation and then converts the grant table and p2m rwlocks to > use the percpu rwlocks, in order to improve multi-socket host performance. > > CPU profiling has revealed the rwlocks themselves suffer from severe cache > line bouncing due to the cmpxchg operation used even when taking a read lock. > Multiqueue paravirtualised I/O results in heavy contention of the grant table > and p2m read locks of a specific domain and so I/O throughput is bottlenecked > by the overhead of the cache line bouncing itself. > > Per-cpu read locks avoid lock cache line bouncing by using a per-cpu data > area to record a CPU has taken the read lock. Correctness is enforced for the > write lock by using a per lock barrier which forces the per-cpu read lock > to revert to using a standard read lock. The write lock then polls all > the percpu data area until active readers for the lock have exited. > > Removing the cache line bouncing on a multi-socket Haswell-EP system > dramatically improves performance, with 16 vCPU network IO performance going > from 15 gb/s to 64 gb/s! The host under test was fully utilising all 40 > logical CPU's at 64 gb/s, so a bigger logical CPU host may see an even better > IO improvement. Impressive -- thanks for doing this work. One question: Your description here sounds like you've tested with a single large domain, but what happens with multiple domains? It looks like the "per-cpu-rwlock" is shared by *all* locks of a particular type (e.g., all domains share the per-cpu p2m rwlock). (Correct me if I'm wrong here.) Which means two things: 1) *Any* writer will have to wait for the rwlock of that type to be released on *all* domains before being able to write. Is there any risk that on a busy system, this will be an unusually long wait? 2) *All* domains will have to take the slow path for reading when a *any* domain has or is trying to acquire the write lock. What is the probability that on a busy system that turns out to be "most of the time"? #2 is of course no worse than it is now, but #1 could be a bit of a bear. -George _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |