[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH 05/11] qspinlock: Optimize for smaller NR_CPUS
On Wed, Jun 18, 2014 at 11:57:30AM -0400, Konrad Rzeszutek Wilk wrote: > On Sun, Jun 15, 2014 at 02:47:02PM +0200, Peter Zijlstra wrote: > > From: Peter Zijlstra <peterz@xxxxxxxxxxxxx> > > > > When we allow for a max NR_CPUS < 2^14 we can optimize the pending > > wait-acquire and the xchg_tail() operations. > > > > By growing the pending bit to a byte, we reduce the tail to 16bit. > > This means we can use xchg16 for the tail part and do away with all > > the repeated compxchg() operations. > > > > This in turn allows us to unconditionally acquire; the locked state > > as observed by the wait loops cannot change. And because both locked > > and pending are now a full byte we can use simple stores for the > > state transition, obviating one atomic operation entirely. > > I have to ask - how much more performance do you get from this? > > Is this extra atomic operation hurting that much? Its not extra, its a cmpxchg loop vs an unconditional xchg. And yes, its somewhat tedious to show, but on 4 socket systems you can really see it make a difference. I'll try and run some numbers, I need to reinstall the box. (there were numbers in the previous threads, but you're right, I should've put some in the Changelog). > > /** > > * queue_spin_lock_slowpath - acquire the queue spinlock > > @@ -173,8 +259,13 @@ void queue_spin_lock_slowpath(struct qsp > > * we're pending, wait for the owner to go away. > > * > > * *,1,1 -> *,1,0 > > + * > > + * this wait loop must be a load-acquire such that we match the > > + * store-release that clears the locked bit and create lock > > + * sequentiality; this because not all clear_pending_set_locked() > > + * implementations imply full barriers. > > */ > > - while ((val = atomic_read(&lock->val)) & _Q_LOCKED_MASK) > > + while ((val = smp_load_acquire(&lock->val.counter)) & _Q_LOCKED_MASK) > > lock->val.counter? Ugh, all to deal with the 'int' -> 'u32' (or 'u64') No, to do atomic_t -> int. > Could you introduce a macro in atomic.h called 'atomic_read_raw' which > would do the this? Like this: That would be worse I think. It looks like a function returning an rvalue whereas we really want an lvalue. Attachment:
pgp5U5l9Vl5Pf.pgp _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |