[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks

To: Waiman Long <waiman.long@xxxxxx>
From: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
Date: Fri, 28 Feb 2014 10:29:45 +0100
Cc: Jeremy Fitzhardinge <jeremy@xxxxxxxx>, Raghavendra K T <raghavendra.kt@xxxxxxxxxxxxxxxxxx>, Boris Ostrovsky <boris.ostrovsky@xxxxxxxxxx>, virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx, Andi Kleen <andi@xxxxxxxxxxxxxx>, "H. Peter Anvin" <hpa@xxxxxxxxx>, Michel Lespinasse <walken@xxxxxxxxxx>, Alok Kataria <akataria@xxxxxxxxxx>, linux-arch@xxxxxxxxxxxxxxx, x86@xxxxxxxxxx, Ingo Molnar <mingo@xxxxxxxxxx>, Scott J Norton <scott.norton@xxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx, "Paul E. McKenney" <paulmck@xxxxxxxxxxxxxxxxxx>, Alexander Fyodorov <halcy@xxxxxxxxx>, Arnd Bergmann <arnd@xxxxxxxx>, Daniel J Blueman <daniel@xxxxxxxxxxxxx>, Rusty Russell <rusty@xxxxxxxxxxxxxxx>, Oleg Nesterov <oleg@xxxxxxxxxx>, Steven Rostedt <rostedt@xxxxxxxxxxx>, Chris Wright <chrisw@xxxxxxxxxxxx>, George Spelvin <linux@xxxxxxxxxxx>, Thomas Gleixner <tglx@xxxxxxxxxxxxx>, Aswin Chandramouleeswaran <aswin@xxxxxx>, Chegu Vinod <chegu_vinod@xxxxxx>, Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>, linux-kernel@xxxxxxxxxxxxxxx, David Vrabel <david.vrabel@xxxxxxxxxx>, Paolo Bonzini <pbonzini@xxxxxxxxxx>, Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>, Tim Chen <tim.c.chen@xxxxxxxxxxxxxxx>
Delivery-date: Fri, 28 Feb 2014 09:30:42 +0000
List-id: Xen developer discussion <xen-devel.lists.xen.org>

On Thu, Feb 27, 2014 at 03:42:19PM -0500, Waiman Long wrote:
> >>+   old = xchg(&qlock->lock_wait, _QSPINLOCK_WAITING|_QSPINLOCK_LOCKED);
> >>+
> >>+   if (old == 0) {
> >>+           /*
> >>+            * Got the lock, can clear the waiting bit now
> >>+            */
> >>+           smp_u8_store_release(&qlock->wait, 0);
> >
> >So we just did an atomic op, and now you're trying to optimize this
> >write. Why do you need a whole byte for that?
> >
> >Surely a cmpxchg loop with the right atomic op can't be _that_ much
> >slower? Its far more readable and likely avoids that steal fail below as
> >well.
> 
> At low contention level, atomic operations that requires a lock prefix are
> the major contributor to the total execution times. I saw estimate online
> that the time to execute a lock prefix instruction can easily be 50X longer
> than a regular instruction that can be pipelined. That is why I try to do it
> with as few lock prefix instructions as possible. If I have to do an atomic
> cmpxchg, it probably won't be faster than the regular qspinlock slowpath.

At low contention the cmpxchg won't have to be retried (much) so using
it won't be a problem and you get to have arbitrary atomic ops.

> Given that speed at low contention level which is the common case is
> important to get this patch accepted, I have to do what I can to make it run
> as far as possible for this 2 contending task case.

What I'm saying is that you can do the whole thing with a single
cmpxchg. No extra ops needed. And at that point you don't need a whole
byte, you can use a single bit.

that removes the whole NR_CPUS dependent logic.

> >>+           /*
> >>+            * Someone has steal the lock, so wait again
> >>+            */
> >>+           goto try_again;

> >That's just a fail.. steals should not ever be allowed. It's a fair lock
> >after all.
> 
> The code is unfair, but this unfairness help it to run faster than ticket
> spinlock in this particular case. And the regular qspinlock slowpath is
> fair. A little bit of unfairness in this particular case helps its speed.

*groan*, no, unfairness not cool. ticket lock is absolutely fair; we
should preserve this.

BTW; can you share your benchmark thingy? 

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

Follow-Ups:
- Re: [Xen-devel] [PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks
  - From: Waiman Long
- Re: [Xen-devel] [PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks
  - From: Linus Torvalds

References:
- [Xen-devel] [PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks
  - From: Waiman Long
- Re: [Xen-devel] [PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks
  - From: Peter Zijlstra
- Re: [Xen-devel] [PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks
  - From: Waiman Long

Prev by Date: [Xen-devel] intercept and capture fast system call of linux
Next by Date: Re: [Xen-devel] R: ARM: access to iomem and HW IRQ
Previous by thread: Re: [Xen-devel] [PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks
Next by thread: Re: [Xen-devel] [PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks
Index(es):
- Date
- Thread

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.