[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Xen PVM: Strange lockups when running PostgreSQL load

On Thu, 2012-10-18 at 08:38 +0100, Stefan Bader wrote:
> On 18.10.2012 09:08, Jan Beulich wrote:
> >>>> On 18.10.12 at 09:00, "Jan Beulich" <JBeulich@xxxxxxxx> wrote:
> >>>>> On 17.10.12 at 17:35, Andrew Cooper <andrew.cooper3@xxxxxxxxxx> wrote:
> >>> In each case, the event channels are masked (no surprise given the
> >>> conversation so far on this thread), and have no pending events. 
> >>> Therefore, I believe we are looking at the same bug.
> >>
> >> That seems very unlikely (albeit not impossible) to me, given that
> >> the non-pvops kernel uses ticket locks while the pvops one doesn't.
> > 
> > And in fact we had a similar problem with our original ticket lock
> > implementation, exposed by an open coded lock in the scheduler's
> > run queue management. But that was really ticket lock specific,
> > in that the fact that a CPU could passively become the owner of
> > a lock while polling - that's impossible with pvops' byte locks afaict.
> One of the trains of thought I had was whether it could happen that a cpu is 
> in
> polling and the task gets moved. But I don't think it can happen as the
> hypercall unlikely is a place where any schedule happens (preempt is none). 
> And
> it would be much more common...
> One detail which I hope someone can fill in is the whole "interrupted 
> spinlock"
> thing. Saving the last lock pointer stored on the per-cpu lock_spinners and so
> on. Is that really only for spinlocks taken without interrupts disabled or do 
> I
> miss something there?

spinning_lock() returns the old lock which the caller is expected to
remember and replace via unspinning_lock() -- it effectively implements
a stack of locks which are being waited on. xen_spin_lock_slow (the only
caller0 appears to do this correctly from a brief inspection.

Is there any chance this is just a simple AB-BA or similar type
deadlock? Do we have data which suggests all vCPUs are waiting on the
same lock or just that they are waiting on some lock? I suppose lockdep
(which I think you mentioned before?) would have caught this, unless pv
locks somehow confound it?


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.