[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [RFC] Implement Batched (group) ticket lock

To: Rik van Riel <riel@xxxxxxxxxx>
From: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
Date: Wed, 28 May 2014 15:19:49 -0700
Cc: Jeremy Fitzhardinge <jeremy@xxxxxxxx>, Raghavendra K T <raghavendra.kt@xxxxxxxxxxxxxxxxxx>, KVM list <kvm@xxxxxxxxxxxxxxx>, Peter Zijlstra <peterz@xxxxxxxxxxxxx>, Jason Wang <jasowang@xxxxxxxxxx>, Oleg Nesterov <oleg@xxxxxxxxxx>, Paul Gortmaker <paul.gortmaker@xxxxxxxxxxxxx>, Peter Anvin <hpa@xxxxxxxxx>, Andi Kleen <ak@xxxxxxxxxxxxxxx>, Gleb Natapov <gleb@xxxxxxxxxx>, the arch/x86 maintainers <x86@xxxxxxxxxx>, Ingo Molnar <mingo@xxxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx, Paul McKenney <paulmck@xxxxxxxxxxxxxxxxxx>, virtualization <virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx>, Dave Jones <davej@xxxxxxxxxx>, Thomas Gleixner <tglx@xxxxxxxxxxxxx>, fernando_b1@xxxxxxxxxxxxx, "Vinod, Chegu" <chegu_vinod@xxxxxx>, Waiman Long <waiman.long@xxxxxx>, Marcelo Tosatti <mtosatti@xxxxxxxxxx>, Linux Kernel Mailing List <linux-kernel@xxxxxxxxxxxxxxx>, Paolo Bonzini <pbonzini@xxxxxxxxxx>
Delivery-date: Wed, 28 May 2014 22:20:09 +0000
List-id: Xen developer discussion <xen-devel.lists.xen.org>

On Wed, May 28, 2014 at 2:55 PM, Rik van Riel <riel@xxxxxxxxxx> wrote:
>
> Or maybe cmpxchg is cheap once you already own the cache line
> exclusively?

A locked cmpxchg ends up being anything between ~15-50 cycles
depending on microarchitecture if things are already exclusively in
the cache (with the P4 being an outlier, and all locked instructions
tend to take ~100+ cycles, but I can't say I can really find it in
myself to even care about netburst any more).

The most noticeable downside we've seen has been when we've used
"read-op-cmpxchg" as a _replacement_ for something like "lock [x]add",
when that "read+cmpxchg" has caused two cacheline ops (cacheline first
loaded shared by the read, then exclusive by the cmpxchg). That's bad.

But if preceded by a write (or, in this case, an xadd), that doesn't
happen. Still, those roughly 15-50 cycles can certainly be noticeable
(especially at the high end), but you need to have some load that
doesn't bounce the lock, and largely fit in the caches to see it. And
you probably want to test one of the older CPU's, I think Haswell is
the lower end (ie in the ~20 cycle range).

If somebody has a P4 still, that's likely the worst case by far.

              Linus

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

Follow-Ups:
- Re: [Xen-devel] [RFC] Implement Batched (group) ticket lock
  - From: Rik van Riel
- Re: [Xen-devel] [RFC] Implement Batched (group) ticket lock
  - From: Thomas Gleixner

References:
- [Xen-devel] [RFC] Implement Batched (group) ticket lock
  - From: Raghavendra K T
- Re: [Xen-devel] [RFC] Implement Batched (group) ticket lock
  - From: Rik van Riel

Prev by Date: Re: [Xen-devel] [RFC] Implement Batched (group) ticket lock
Next by Date: Re: [Xen-devel] [RFC] Implement Batched (group) ticket lock
Previous by thread: Re: [Xen-devel] [RFC] Implement Batched (group) ticket lock
Next by thread: Re: [Xen-devel] [RFC] Implement Batched (group) ticket lock
Index(es):
- Date
- Thread

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.