[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v10 03/19] qspinlock: Add pending bit

2014-05-13 15:47-0400, Waiman Long:
> On 05/12/2014 11:22 AM, Radim KrÄmÃÅ wrote:
> >I think there is an unwanted scenario on virtual machines:
> >1) VCPU sets the pending bit and start spinning.
> >2) Pending VCPU gets descheduled.
> >     - we have PLE and lock holder isn't running [1]
> >     - the hypervisor randomly preempts us
> >3) Lock holder unlocks while pending VCPU is waiting in queue.
> >4) Subsequent lockers will see free lock with set pending bit and will
> >    loop in trylock's 'for (;;)'
> >     - the worst-case is lock starving [2]
> >     - PLE can save us from wasting whole timeslice
> >
> >Retry threshold is the easiest solution, regardless of its ugliness [4].
> Yes, that can be a real issue. Some sort of retry threshold, as you said,
> should be able to handle it.
> BTW, the relevant patch should be 16/19 where the PV spinlock stuff should
> be discussed. This patch is perfectly fine.

Ouch, my apology to Peter didn't make it ... Agreed, I should have split
the comment under patches
 [06/19] (part quoted above; does not depend on PV),
 [16/19] (part quoted below) and
 [17/19] (general doubts).

> >Another minor design flaw is that formerly first VCPU gets appended to
> >the tail when it decides to queue;
> >is the performance gain worth it?
> >
> >Thanks.
> Yes, the performance gain is worth it. The primary goal is to be not worse
> than ticket spinlock in light load situation which is the most common case.
> This feature is need to achieve that.

I've seen merit in pvqspinlock even with slightly slower first-waiter,
so I would have happily sacrificed those horrible branches.
(I prefer elegant to optimized code, but I can see why we want to be
 strictly better than ticketlock.)
Peter mentioned that we are focusing on bare-metal patches, so I'll
withold my other paravirt rants until they are polished.

And to forcefully bring this thread a little bit on-topic:

Pending-bit is effectively a lock in a lock, so I was wondering why
don't we use more pending bits; advantages are the same, just diminished
by the probability of having an ideally contended lock:
 - waiter won't be blocked on RAM access if critical section (or more)
   ends sooner
 - some unlucky cacheline is not forgotten
 - faster unlock (no need for tail operations)
(- ?)
disadvantages are magnified:
 - increased complexity
 - intense cacheline sharing
   (I thought that this is the main disadvantage of ticketlock.)
(- ?)

One bit still improved performance, is it the best we got?


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.