[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Xen PVM: Strange lockups when running PostgreSQL load
On 17/10/12 14:10, Stefan Bader wrote: > I am currently looking at a bug report[1] which is happening when > a Xen PVM guest with multiple VCPUs is running a high IO database > load (a test script is available in the bug report). > > In experimenting it seems that this happens (or is getting more > likely) when the number of VCPUs is 8 or higher (though I have > not tried 6, only 2 and 4), having autogroup enabled seems to > make it more likely, too (at some point thought it would actually > prevent it but we were wrong) and pv spinlocks enabled. > It has happened with older (3.4.3) and newer (4.1.2) versions of > Xen as a host and with 3.2 and 3.5 kernels as guests. > > The pv spinlock assumption I will try to get re-verified by asking > to reproduce under a real load with a kernel that just disables > that. However, the dumps I am looking at really do look odd there. > > The first dump I looked at had the spinlock of runqueue[0] being > placed into the per-cpu lock_spinners variable for cpu#0 and > cpu#7 (doing my tests with 8 VCPUs). So apparently both cpus were > waiting on the slow path for it to become free. Though actually > it was free! Now, here is one issue I have in understanding the > dump: the back traces produced in crash are in the normal form > not showing any cpu in the poll_irq HV call. Only when using > the form that uses the last known stack location and displays > all text symols found will get close for cpu#7. cpu#0 still does > not seem to be anywhere close. This could be a problem with crash, > or with the way PVM works, I am not sure. > > Anyway, from what I could take from that situation, it seemed that > cpu#1 (that one had soft lockup warnings in the log) seemed to try > to do a wake_up on the task that just seemed to have done an io > schedule on cpu#7 (but the waitqueue_head spinlock is currently > locked). cpu#7 tries to get the runqueue[0] spinlock to do an idle > migration (though the lock currently is free). So I guessed that > maybe cpu#0 was just woken for the free lock but has not grabbed > it yet. > > From there I wondered whether something that acquires a spinlock > usually more than the quick path timeout (and this causes new > lockers to go into the slow path), could cause a stall on a high > numbered cpu when the lock is contented (as the search and > signalling is only done for the first waiter starting from 0). > That lead to below patch. The good thing about it, it does not > break things immediately, the bad thing, it does not help with > the problem. :/ > > The interesting thing when looking at a second dump, taken with > a testkernel using the patch below, was that again 2 cpus seemed > to spin slow on a lock that was free in the dump. And again at > least one did not seem to be doing anything spinlock related > (anymore?). > One other detail (but that could be just incidence as well) was > that in unmodified kernel I usually would end up with soft > lockup messages, with the patched kernel, that was a stall > detected by rcu_bh (0 and 1 stalled, detected by 3). > > Now I am a bit confused and wonder about some basic things: > 1. When a cpu goes into the slow lock path and into the poll_irq, > shouldn't I expect this one to be in hypercall_page in the > dump? > 2. When does the whole handling for interrupted spinlock wait > apply? I assume only for a spinlock taken without irqs > disabled and then trying to acquire another one in an > interrupt handler. Is that correct? > 3. Not really related but I just stumbled over it: > In xen_spin_trylock: "asm("xchgb %b0,%1"...) > What does the "b" part of %b0 do? I thought xchgb already > would say it is a byte value... > > But putting aside those questions, the fact that two times > there was two cpus waiting on the same lock which from the > lock count (=0) was free seems a bit too much of a coincidence. > Oh and the spinners count in the lock was 2 as one would > expect. > > struct rq { > lock = { > raw_lock = { > { > head_tail = 512, > tickets = { > head = 0 '\000', > tail = 2 '\002' > } > } > } > }, > nr_running = 1, > ... > } > > I really don't know how this happens. Especially cpu#0 seems at > least past the wakeup and should have removed itself from the > spinners list... > > -Stefan We (Citrix) have an outstanding bug, on 2.6.32 classic, which we have never managed to get to the bottom of (because of inability to reliably reproduce), which might be related. In our case, certain processes were locking up, and it turned out that the kernel was issuing SCHOP_poll hypercalls (same the stack trace on your launchpad ticket) on its own spinlock IPI event channel (understandable, as its a spinlock), but with the event channel masked, so it would never wake up from the poll. Can you reliably reproduce the issue? and if so, would it be possible to capture hypervisor debug keys q and e when the badness happens? This would clearly show if our bugs are related or not Our repro case took about a month to reproduce, and unfortunately turning lockdep on has appeared to fix the problem. ~Andrew > > [1] http://bugs.launchpad.net/bugs/1011792 > > From 635a4e101ccbc9a324c8000f7e264ed4e646592c Mon Sep 17 00:00:00 2001 > From: Stefan Bader <stefan.bader@xxxxxxxxxxxxx> > Date: Tue, 16 Oct 2012 17:46:09 +0200 > Subject: [PATCH] xen/spinlocks: Make wakeup fairer > > Instead of sending the IPI signalling the free lock to the first > online CPU found waiting for it, start the search from the CPU > that had the lock last. > > Signed-off-by: Stefan Bader <stefan.bader@xxxxxxxxxxxxx> > --- > arch/x86/xen/spinlock.c | 22 ++++++++++++++-------- > 1 file changed, 14 insertions(+), 8 deletions(-) > > diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c > index d69cc6c..8b86efb 100644 > --- a/arch/x86/xen/spinlock.c > +++ b/arch/x86/xen/spinlock.c > @@ -320,17 +320,23 @@ static void xen_spin_lock_flags(struct arch_spinlock > *lock, unsigned long flags) > static noinline void xen_spin_unlock_slow(struct xen_spinlock *xl) > { > int cpu; > + int this_cpu = smp_processor_id(); > > ADD_STATS(released_slow, 1); > > - for_each_online_cpu(cpu) { > - /* XXX should mix up next cpu selection */ > - if (per_cpu(lock_spinners, cpu) == xl) { > - ADD_STATS(released_slow_kicked, 1); > - xen_send_IPI_one(cpu, XEN_SPIN_UNLOCK_VECTOR); > - break; > - } > - } > + cpu = cpumask_next(this_cpu, cpu_online_mask); > + do { > + if (cpu < nr_cpu_ids) { > + if (per_cpu(lock_spinners, cpu) == xl) { > + ADD_STATS(released_slow_kicked, 1); > + xen_send_IPI_one(cpu, XEN_SPIN_UNLOCK_VECTOR); > + break; > + } > + } else > + cpu = -1; > + > + cpu = cpumask_next(cpu, cpu_online_mask); > + } while (cpu != this_cpu); > } > > static void xen_spin_unlock(struct arch_spinlock *lock) -- Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer T: +44 (0)1223 225 900, http://www.citrix.com _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |