[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Xen-devel] Xen PVM: Strange lockups when running PostgreSQL load
I am currently looking at a bug report[1] which is happening when a Xen PVM guest with multiple VCPUs is running a high IO database load (a test script is available in the bug report). In experimenting it seems that this happens (or is getting more likely) when the number of VCPUs is 8 or higher (though I have not tried 6, only 2 and 4), having autogroup enabled seems to make it more likely, too (at some point thought it would actually prevent it but we were wrong) and pv spinlocks enabled. It has happened with older (3.4.3) and newer (4.1.2) versions of Xen as a host and with 3.2 and 3.5 kernels as guests. The pv spinlock assumption I will try to get re-verified by asking to reproduce under a real load with a kernel that just disables that. However, the dumps I am looking at really do look odd there. The first dump I looked at had the spinlock of runqueue[0] being placed into the per-cpu lock_spinners variable for cpu#0 and cpu#7 (doing my tests with 8 VCPUs). So apparently both cpus were waiting on the slow path for it to become free. Though actually it was free! Now, here is one issue I have in understanding the dump: the back traces produced in crash are in the normal form not showing any cpu in the poll_irq HV call. Only when using the form that uses the last known stack location and displays all text symols found will get close for cpu#7. cpu#0 still does not seem to be anywhere close. This could be a problem with crash, or with the way PVM works, I am not sure. Anyway, from what I could take from that situation, it seemed that cpu#1 (that one had soft lockup warnings in the log) seemed to try to do a wake_up on the task that just seemed to have done an io schedule on cpu#7 (but the waitqueue_head spinlock is currently locked). cpu#7 tries to get the runqueue[0] spinlock to do an idle migration (though the lock currently is free). So I guessed that maybe cpu#0 was just woken for the free lock but has not grabbed it yet. >From there I wondered whether something that acquires a spinlock usually more than the quick path timeout (and this causes new lockers to go into the slow path), could cause a stall on a high numbered cpu when the lock is contented (as the search and signalling is only done for the first waiter starting from 0). That lead to below patch. The good thing about it, it does not break things immediately, the bad thing, it does not help with the problem. :/ The interesting thing when looking at a second dump, taken with a testkernel using the patch below, was that again 2 cpus seemed to spin slow on a lock that was free in the dump. And again at least one did not seem to be doing anything spinlock related (anymore?). One other detail (but that could be just incidence as well) was that in unmodified kernel I usually would end up with soft lockup messages, with the patched kernel, that was a stall detected by rcu_bh (0 and 1 stalled, detected by 3). Now I am a bit confused and wonder about some basic things: 1. When a cpu goes into the slow lock path and into the poll_irq, shouldn't I expect this one to be in hypercall_page in the dump? 2. When does the whole handling for interrupted spinlock wait apply? I assume only for a spinlock taken without irqs disabled and then trying to acquire another one in an interrupt handler. Is that correct? 3. Not really related but I just stumbled over it: In xen_spin_trylock: "asm("xchgb %b0,%1"...) What does the "b" part of %b0 do? I thought xchgb already would say it is a byte value... But putting aside those questions, the fact that two times there was two cpus waiting on the same lock which from the lock count (=0) was free seems a bit too much of a coincidence. Oh and the spinners count in the lock was 2 as one would expect. struct rq { lock = { raw_lock = { { head_tail = 512, tickets = { head = 0 '\000', tail = 2 '\002' } } } }, nr_running = 1, ... } I really don't know how this happens. Especially cpu#0 seems at least past the wakeup and should have removed itself from the spinners list... -Stefan [1] http://bugs.launchpad.net/bugs/1011792 >From 635a4e101ccbc9a324c8000f7e264ed4e646592c Mon Sep 17 00:00:00 2001 From: Stefan Bader <stefan.bader@xxxxxxxxxxxxx> Date: Tue, 16 Oct 2012 17:46:09 +0200 Subject: [PATCH] xen/spinlocks: Make wakeup fairer Instead of sending the IPI signalling the free lock to the first online CPU found waiting for it, start the search from the CPU that had the lock last. Signed-off-by: Stefan Bader <stefan.bader@xxxxxxxxxxxxx> --- arch/x86/xen/spinlock.c | 22 ++++++++++++++-------- 1 file changed, 14 insertions(+), 8 deletions(-) diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c index d69cc6c..8b86efb 100644 --- a/arch/x86/xen/spinlock.c +++ b/arch/x86/xen/spinlock.c @@ -320,17 +320,23 @@ static void xen_spin_lock_flags(struct arch_spinlock *lock, unsigned long flags) static noinline void xen_spin_unlock_slow(struct xen_spinlock *xl) { int cpu; + int this_cpu = smp_processor_id(); ADD_STATS(released_slow, 1); - for_each_online_cpu(cpu) { - /* XXX should mix up next cpu selection */ - if (per_cpu(lock_spinners, cpu) == xl) { - ADD_STATS(released_slow_kicked, 1); - xen_send_IPI_one(cpu, XEN_SPIN_UNLOCK_VECTOR); - break; - } - } + cpu = cpumask_next(this_cpu, cpu_online_mask); + do { + if (cpu < nr_cpu_ids) { + if (per_cpu(lock_spinners, cpu) == xl) { + ADD_STATS(released_slow_kicked, 1); + xen_send_IPI_one(cpu, XEN_SPIN_UNLOCK_VECTOR); + break; + } + } else + cpu = -1; + + cpu = cpumask_next(cpu, cpu_online_mask); + } while (cpu != this_cpu); } static void xen_spin_unlock(struct arch_spinlock *lock) -- 1.7.9.5 _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |