[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] xen: Send spinlock IPI to all waiters



On Fri, Feb 15, 2013 at 11:52:35AM +0100, Stefan Bader wrote:
> Hopefully not mis-parsing Jan's last comments on the other thread,
> this would be the fix covering things until a better implementation
> is done.
> This also prevents the hang on older kernels, where it could be re-
> produced reliably.
> 
> -Stefan
> 
> >From 7e042a253b06da96409a0e059744c217f396a17f Mon Sep 17 00:00:00 2001
> From: Stefan Bader <stefan.bader@xxxxxxxxxxxxx>
> Date: Fri, 15 Feb 2013 09:48:52 +0100
> Subject: [PATCH] xen: Send spinlock IPI to all waiters
> 
> There is a loophole between Xen's current implementation of
> pv-spinlocks and the scheduler. This was triggerable through
> a testcase until v3.6 changed the TLB flushing code. The
> problem potentially is still there just not observable in the
> same way.
> 
> What could happen was (is):
> 
> 1. CPU n tries to schedule task x away and goes into a slow
>    wait for the runq lock of CPU n-# (must be one with a lower
>    number).
> 2. CPU n-#, while processing softirqs, tries to balance domains
>    and goes into a slow wait for its own runq lock (for updating
>    some records). Since this is a spin_lock_irqsave in softirq
>    context, interrupts will be re-enabled for the duration of
>    the poll_irq hypercall used by Xen.
> 3. Before the runq lock of CPU n-# is unlocked, CPU n-1 receives
>    an interrupt (e.g. endio) and when processing the interrupt,
>    tries to wake up task x. But that is in schedule and still
>    on_cpu, so try_to_wake_up goes into a tight loop.
> 4. The runq lock of CPU n-# gets unlocked, but the message only
>    gets sent to the first waiter, which is CPU n-# and that is
>    busily stuck.

Just for completness:

5. The 3) (so CPU n-1) sits in its tight loop and never exits
   as nothing ever interrupted it.


> 
> To avoid this and since the unlocking code has no real sense of
> which waiter is best suited to grab the lock, just send the IPI
> to all of them. This causes the waiters to return from the hyper-
> call (those not interrupted at least) and do active spinlocking.
> 
> BugLink: http://bugs.launchpad.net/bugs/1011792
> 
> Signed-off-by: Stefan Bader <stefan.bader@xxxxxxxxxxxxx>
> Cc: stable@xxxxxxxxxxxxxxx
> ---
>  arch/x86/xen/spinlock.c |    1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c
> index 83e866d..f7a080e 100644
> --- a/arch/x86/xen/spinlock.c
> +++ b/arch/x86/xen/spinlock.c
> @@ -328,7 +328,6 @@ static noinline void xen_spin_unlock_slow(struct 
> xen_spinlock *xl)
>               if (per_cpu(lock_spinners, cpu) == xl) {
>                       ADD_STATS(released_slow_kicked, 1);
>                       xen_send_IPI_one(cpu, XEN_SPIN_UNLOCK_VECTOR);
> -                     break;
>               }
>       }
>  }
> -- 
> 1.7.9.5
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxx
> http://lists.xen.org/xen-devel
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.