[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Need help with fixing the Xen waitqueue feature


  • To: Olaf Hering <olaf@xxxxxxxxx>
  • From: Keir Fraser <keir.xen@xxxxxxxxx>
  • Date: Sat, 12 Nov 2011 07:00:43 +0000
  • Cc: xen-devel@xxxxxxxxxxxxxxxxxxx
  • Delivery-date: Fri, 11 Nov 2011 23:01:52 -0800
  • List-id: Xen developer discussion <xen-devel.lists.xensource.com>
  • Thread-index: AcyhCMtUcLY/v8/y0kqO+2Gg/4wVWw==
  • Thread-topic: [Xen-devel] Need help with fixing the Xen waitqueue feature

On 11/11/2011 22:56, "Olaf Hering" <olaf@xxxxxxxxx> wrote:

> Keir,
> 
> just do dump my findings to the list:
> 
> On Tue, Nov 08, Keir Fraser wrote:
> 
>> Tbh I wonder anyway whether stale hypercall context would be likely to cause
>> a silent machine reboot. Booting with max_cpus=1 would eliminate moving
>> between CPUs as a cause of inconsistencies, or pin the guest under test.
>> Another problem could be sleeping with locks held, but we do test for that
>> (in debug builds at least) and I'd expect crash/hang rather than silent
>> reboot. Another problem could be if the vcpu has its own state in an
>> inconsistent/invalid state temporarily (e.g., its pagetable base pointers)
>> which then is attempted to be restored during a waitqueue wakeup. That could
>> certainly cause a reboot, but I don't know of an example where this might
>> happen.
> 
> The crashes also happen with maxcpus=1 and a single guest cpu.
> Today I added wait_event to ept_get_entry and this works.
> 
> But at some point the codepath below is executed, after that wake_up the
> host hangs hard. I will trace it further next week, maybe the backtrace
> gives a glue what the cause could be.

So you run with a single CPU, and with wait_event() in one location, and
that works for a while (actually doing full waitqueue work: executing wait()
and wake_up()), but then hangs? That's weird, but pretty interesting if I've
understood correctly.

> Also, the 3K stacksize is still too small, this path uses 3096.

I'll allocate a whole page for the stack then.

 -- Keir

> (XEN) prep 127a 30 0
> (XEN) wake 127a 30
> (XEN) prep 1cf71 30 0
> (XEN) wake 1cf71 30
> (XEN) prep 1cf72 30 0
> (XEN) wake 1cf72 30
> (XEN) prep 1cee9 30 0
> (XEN) wake 1cee9 30
> (XEN) prep 121a 30 0
> (XEN) wake 121a 30
> 
> (This means 'gfn  (p2m_unshare << 4) in_atomic)'
> 
> (XEN) prep 1ee61 20 0
> (XEN) max stacksize c18
> (XEN) Xen WARN at wait.c:126
> (XEN) ----[ Xen-4.2.24114-20111111.221356  x86_64  debug=y  Tainted:    C
> ]----
> (XEN) CPU:    0
> (XEN) RIP:    e008:[<ffff82c48012b85e>] prepare_to_wait+0x178/0x1b2
> (XEN) RFLAGS: 0000000000010286   CONTEXT: hypervisor
> (XEN) rax: 0000000000000000   rbx: ffff830201f76000   rcx: 0000000000000000
> (XEN) rdx: ffff82c4802b7f18   rsi: 000000000000000a   rdi: ffff82c4802673f0
> (XEN) rbp: ffff82c4802b73a8   rsp: ffff82c4802b7378   r8:  0000000000000000
> (XEN) r9:  ffff82c480221da0   r10: 00000000fffffffa   r11: 0000000000000003
> (XEN) r12: ffff82c4802b7f18   r13: ffff830201f76000   r14: ffff83003ea5c000
> (XEN) r15: 000000000001ee61   cr0: 000000008005003b   cr4: 00000000000026f0
> (XEN) cr3: 000000020336d000   cr2: 00007fa88ac42000
> (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
> (XEN) Xen stack trace from rsp=ffff82c4802b7378:
> (XEN)    0000000000000020 000000000001ee61 0000000000000002 ffff830201aa9e90
> (XEN)    ffff830201aa9f60 0000000000000020 ffff82c4802b7428 ffff82c4801e02f9
> (XEN)    ffff830000000002 0000000000000000 ffff82c4802b73f8 ffff82c4802b73f4
> (XEN)    0000000000000000 ffff82c4802b74e0 ffff82c4802b74e4 0000000101aa9e90
> (XEN)    000000ffffffffff ffff830201aa9e90 000000000001ee61 ffff82c4802b74e4
> (XEN)    0000000000000002 0000000000000000 ffff82c4802b7468 ffff82c4801d810f
> (XEN)    ffff82c4802b74e0 000000000001ee61 ffff830201aa9e90 ffff82c4802b75bc
> (XEN)    00000000002167f5 ffff88001ee61900 ffff82c4802b7518 ffff82c480211b80
> (XEN)    ffff8302167f5000 ffff82c4801c168c 0000000000000000 ffff83003ea5c000
> (XEN)    ffff88001ee61900 0000000001805063 0000000001809063 000000001ee001e3
> (XEN)    000000001ee61067 00000000002167f5 000000000022ee70 000000000022ed10
> (XEN)    ffffffffffffffff 0000000a00000007 0000000000000004 ffff82c48025db80
> (XEN)    ffff83003ea5c000 ffff82c4802b75bc ffff88001ee61900 ffff830201aa9e90
> (XEN)    ffff82c4802b7528 ffff82c480211cb1 ffff82c4802b7568 ffff82c4801da97f
> (XEN)    ffff82c4801be053 0000000000000008 ffff82c4802b7b58 ffff88001ee61900
> (XEN)    0000000000000000 ffff82c4802b78b0 ffff82c4802b75f8 ffff82c4801aaec8
> (XEN)    0000000000000003 ffff88001ee61900 ffff82c4802b78b0 ffff82c4802b7640
> (XEN)    ffff83003ea5c000 00000000000000a0 0000000000000900 0000000000000008
> (XEN)    00000003802b7650 0000000000000004 00000003802b7668 0000000000000000
> (XEN)    ffff82c4802b7b58 0000000000000001 0000000000000003 ffff82c4802b78b0
> (XEN) Xen call trace:
> (XEN)    [<ffff82c48012b85e>] prepare_to_wait+0x178/0x1b2
> (XEN)    [<ffff82c4801e02f9>] ept_get_entry+0x81/0xd8
> (XEN)    [<ffff82c4801d810f>] gfn_to_mfn_type_p2m+0x55/0x114
> (XEN)    [<ffff82c480211b80>] hap_p2m_ga_to_gfn_4_levels+0x1c4/0x2d6
> (XEN)    [<ffff82c480211cb1>] hap_gva_to_gfn_4_levels+0x1f/0x2e
> (XEN)    [<ffff82c4801da97f>] paging_gva_to_gfn+0xae/0xc4
> (XEN)    [<ffff82c4801aaec8>] hvmemul_linear_to_phys+0xf1/0x25c
> (XEN)    [<ffff82c4801ab762>] hvmemul_rep_movs+0xe8/0x31a
> (XEN)    [<ffff82c48018de07>] x86_emulate+0x4e01/0x10fde
> (XEN)    [<ffff82c4801aab3c>] hvm_emulate_one+0x12d/0x1c5
> (XEN)    [<ffff82c4801b68a9>] handle_mmio+0x4e/0x1d8
> (XEN)    [<ffff82c4801b3a1e>] hvm_hap_nested_page_fault+0x1e7/0x302
> (XEN)    [<ffff82c4801d1ff6>] vmx_vmexit_handler+0x12cf/0x1594
> (XEN)
> (XEN) wake 1ee61 20
> 
> 
> 



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.