[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks

To: Waiman Long <waiman.long@xxxxxx>
From: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
Date: Fri, 28 Feb 2014 18:56:45 +0100
Cc: Jeremy Fitzhardinge <jeremy@xxxxxxxx>, Raghavendra K T <raghavendra.kt@xxxxxxxxxxxxxxxxxx>, Boris Ostrovsky <boris.ostrovsky@xxxxxxxxxx>, virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx, Andi Kleen <andi@xxxxxxxxxxxxxx>, "H. Peter Anvin" <hpa@xxxxxxxxx>, Michel Lespinasse <walken@xxxxxxxxxx>, Alok Kataria <akataria@xxxxxxxxxx>, linux-arch@xxxxxxxxxxxxxxx, x86@xxxxxxxxxx, Ingo Molnar <mingo@xxxxxxxxxx>, Scott J Norton <scott.norton@xxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx, "Paul E. McKenney" <paulmck@xxxxxxxxxxxxxxxxxx>, Alexander Fyodorov <halcy@xxxxxxxxx>, Arnd Bergmann <arnd@xxxxxxxx>, Daniel J Blueman <daniel@xxxxxxxxxxxxx>, Rusty Russell <rusty@xxxxxxxxxxxxxxx>, Oleg Nesterov <oleg@xxxxxxxxxx>, Steven Rostedt <rostedt@xxxxxxxxxxx>, Chris Wright <chrisw@xxxxxxxxxxxx>, George Spelvin <linux@xxxxxxxxxxx>, Thomas Gleixner <tglx@xxxxxxxxxxxxx>, Aswin Chandramouleeswaran <aswin@xxxxxx>, Chegu Vinod <chegu_vinod@xxxxxx>, Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>, linux-kernel@xxxxxxxxxxxxxxx, David Vrabel <david.vrabel@xxxxxxxxxx>, Paolo Bonzini <pbonzini@xxxxxxxxxx>, Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>, Tim Chen <tim.c.chen@xxxxxxxxxxxxxxx>
Delivery-date: Fri, 28 Feb 2014 17:57:39 +0000
List-id: Xen developer discussion <xen-devel.lists.xen.org>


> After modifying it to do a deterministic cmpxchg, the test run time of 2
> contending tasks jumps up from 600ms (best case) to about 1700ms which was
> worse than the original qspinlock's 1300-1500ms. It is the opportunistic
> nature of the xchg() code that can potentially combine multiple steps in the
> deterministic atomic sequence which can saves time. Without that, I would
> rather prefer going back to the basic qspinlock queuing sequence for 2
> contending tasks.
> 
> Please take a look at the performance data in my patch 3 to see if the
> slowdown at 2 and 3 contending tasks are acceptable or not.

Right; so I've gone back to a simple version (~200 lines) that's fairly
easy to comprehend (to me, since I wrote it). And will now try to see if
I can find the same state transitions in your code.

I find your code somewhat hard to follow; mostly due to those xchg() +
fixup thingies. But I'll get there.

> The reason why I need a whole byte for the lock bit is because of the simple
> unlock code of assigning 0 to the lock byte by the lock holder. Utilizing
> other bits in the low byte for other purpose will complicate the unlock path
> and slow down the no-contention case.

Yeah, I get why you need a whole byte for the lock part, I was asking if
we really need another whole byte for the pending part.

So in my version I think I see an optimization case where this is indeed
useful and I can trade an atomic op for a write barrier, which should be
a big win.

It just wasn't at all clear (to me) from your code.

(I also think the optimization isn't x86 specific)

> >>The code is unfair, but this unfairness help it to run faster than ticket
> >>spinlock in this particular case. And the regular qspinlock slowpath is
> >>fair. A little bit of unfairness in this particular case helps its speed.

> >*groan*, no, unfairness not cool. ticket lock is absolutely fair; we
> >should preserve this.
> 
> We can preserve that by removing patch 3.

I've got a version that does the pending thing and still is entirely
fair.

I don't think the concept of the extra spinner is incompatible with
fairness.

> >BTW; can you share your benchmark thingy?
> 
> I have attached the test program that I used to generate the timing data for
> patch 3.

Thanks, I'll have a play.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

References:
- [Xen-devel] [PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks
  - From: Waiman Long
- Re: [Xen-devel] [PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks
  - From: Peter Zijlstra
- Re: [Xen-devel] [PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks
  - From: Waiman Long
- Re: [Xen-devel] [PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks
  - From: Peter Zijlstra
- Re: [Xen-devel] [PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks
  - From: Waiman Long

Prev by Date: Re: [Xen-devel] [PATCH v2] xen: add support for MSI message groups
Next by Date: Re: [Xen-devel] [PATCH v2] xen: add support for MSI message groups
Previous by thread: Re: [Xen-devel] [PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks
Next by thread: [Xen-devel] [PATCH RFC v5 8/8] pvqspinlock, x86: Enable KVM to use qspinlock's PV support
Index(es):
- Date
- Thread

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.