[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Strange PVM spinlock case revisited

To: Ian Campbell <Ian.Campbell@xxxxxxxxxx>
From: Stefan Bader <stefan.bader@xxxxxxxxxxxxx>
Date: Wed, 13 Feb 2013 12:31:15 +0100
Cc: Andrew Cooper <Andrew.Cooper3@xxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxx>
Delivery-date: Wed, 13 Feb 2013 11:32:10 +0000
List-id: Xen developer discussion <xen-devel.lists.xen.org>

On 11.02.2013 18:29, Ian Campbell wrote:

> An interesting hack^Wexperiment might be to make xen_poll_irq use a
> timeout and see if that unwedges things -- this would help confirm that
> the issue is on nested wakeup.
> 

So I did go forward and replaced xen_poll_irq by xen_poll_irq_timeout and it did
get rid of the hang. Though I think there is a big taint there. There was
only one other user of poll_irq_timeout in the kernel code. And that uses
"jiffies + <timeout>*HZ". But when I look at the Xen side in do_poll, that looks
like it is using timeout in a absolute "ns since boot" (of hv/dom0) way. Not
sure how that ever can work. The ns since boot in the guest clearly is always
behind the host (and jiffies isn't ns either).
Effectively I likely got rid of any wait time in the hypervisor and back to
mostly spinning. Which matches the experience that the test run never gets stuck
waiting for a timeout. That maybe proves the stacking is an issue but also is
likely a bit too aggressive in not having any... :/

I will try to think of some better way. Not sure the thinking is realistic but
maybe that could happen:

xen_spin_lock_slow(a)
  ...
  enables irq and upcalls are pending
    upcall processing wants lock b
    xen_spin_lock_slow(b)
                    --- just before replacing lock_spinners ---
                                                   xen_spin_unlock_slow(a)
                                                   finds other vcpu, triggers
                                                     IRQ
    lock b is top spinner
    going into poll_irq
    poll_irq returns
    lock a gets restored
    so maybe no spinners on b
    dropping out to xen_spin_lock
                                                  unlock of b not finding any
                                                  spinners
    lock b acquired

That way the irq for lock a maybe get lost...

Attachment: signature.asc
Description: OpenPGP digital signature

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

Follow-Ups:
- Re: [Xen-devel] Strange PVM spinlock case revisited
  - From: Stefan Bader

References:
- [Xen-devel] Strange PVM spinlock case revisited
  - From: Stefan Bader
- Re: [Xen-devel] Strange PVM spinlock case revisited
  - From: Ian Campbell

Prev by Date: Re: [Xen-devel] [PATCH 3 of 3] blktap3/libxl: Handles blktap3 device in libxl
Next by Date: Re: [Xen-devel] resume from S3 sleep not working in Dom0 - Xen4.2.1
Previous by thread: Re: [Xen-devel] Strange PVM spinlock case revisited
Next by thread: Re: [Xen-devel] Strange PVM spinlock case revisited
Index(es):
- Date
- Thread

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.