[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Need help with fixing the Xen waitqueue feature



On Sat, Nov 12, Keir Fraser wrote:

> On 11/11/2011 22:56, "Olaf Hering" <olaf@xxxxxxxxx> wrote:
> 
> > Keir,
> > 
> > just do dump my findings to the list:
> > 
> > On Tue, Nov 08, Keir Fraser wrote:
> > 
> >> Tbh I wonder anyway whether stale hypercall context would be likely to 
> >> cause
> >> a silent machine reboot. Booting with max_cpus=1 would eliminate moving
> >> between CPUs as a cause of inconsistencies, or pin the guest under test.
> >> Another problem could be sleeping with locks held, but we do test for that
> >> (in debug builds at least) and I'd expect crash/hang rather than silent
> >> reboot. Another problem could be if the vcpu has its own state in an
> >> inconsistent/invalid state temporarily (e.g., its pagetable base pointers)
> >> which then is attempted to be restored during a waitqueue wakeup. That 
> >> could
> >> certainly cause a reboot, but I don't know of an example where this might
> >> happen.
> > 
> > The crashes also happen with maxcpus=1 and a single guest cpu.
> > Today I added wait_event to ept_get_entry and this works.
> > 
> > But at some point the codepath below is executed, after that wake_up the
> > host hangs hard. I will trace it further next week, maybe the backtrace
> > gives a glue what the cause could be.
> 
> So you run with a single CPU, and with wait_event() in one location, and
> that works for a while (actually doing full waitqueue work: executing wait()
> and wake_up()), but then hangs? That's weird, but pretty interesting if I've
> understood correctly.

Yes, thats what happens with single cpu in dom0 and domU.
I have added some more debug. After the backtrace below I see one more
call to check_wakeup_from_wait() for dom0, then the host hangs hard.

> > Also, the 3K stacksize is still too small, this path uses 3096.
> 
> I'll allocate a whole page for the stack then.

Thanks.


Olaf

> > (XEN) prep 127a 30 0
> > (XEN) wake 127a 30
> > (XEN) prep 1cf71 30 0
> > (XEN) wake 1cf71 30
> > (XEN) prep 1cf72 30 0
> > (XEN) wake 1cf72 30
> > (XEN) prep 1cee9 30 0
> > (XEN) wake 1cee9 30
> > (XEN) prep 121a 30 0
> > (XEN) wake 121a 30
> > 
> > (This means 'gfn  (p2m_unshare << 4) in_atomic)'
> > 
> > (XEN) prep 1ee61 20 0
> > (XEN) max stacksize c18
> > (XEN) Xen WARN at wait.c:126
> > (XEN) ----[ Xen-4.2.24114-20111111.221356  x86_64  debug=y  Tainted:    C
> > ]----
> > (XEN) CPU:    0
> > (XEN) RIP:    e008:[<ffff82c48012b85e>] prepare_to_wait+0x178/0x1b2
> > (XEN) RFLAGS: 0000000000010286   CONTEXT: hypervisor
> > (XEN) rax: 0000000000000000   rbx: ffff830201f76000   rcx: 0000000000000000
> > (XEN) rdx: ffff82c4802b7f18   rsi: 000000000000000a   rdi: ffff82c4802673f0
> > (XEN) rbp: ffff82c4802b73a8   rsp: ffff82c4802b7378   r8:  0000000000000000
> > (XEN) r9:  ffff82c480221da0   r10: 00000000fffffffa   r11: 0000000000000003
> > (XEN) r12: ffff82c4802b7f18   r13: ffff830201f76000   r14: ffff83003ea5c000
> > (XEN) r15: 000000000001ee61   cr0: 000000008005003b   cr4: 00000000000026f0
> > (XEN) cr3: 000000020336d000   cr2: 00007fa88ac42000
> > (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
> > (XEN) Xen stack trace from rsp=ffff82c4802b7378:
> > (XEN)    0000000000000020 000000000001ee61 0000000000000002 ffff830201aa9e90
> > (XEN)    ffff830201aa9f60 0000000000000020 ffff82c4802b7428 ffff82c4801e02f9
> > (XEN)    ffff830000000002 0000000000000000 ffff82c4802b73f8 ffff82c4802b73f4
> > (XEN)    0000000000000000 ffff82c4802b74e0 ffff82c4802b74e4 0000000101aa9e90
> > (XEN)    000000ffffffffff ffff830201aa9e90 000000000001ee61 ffff82c4802b74e4
> > (XEN)    0000000000000002 0000000000000000 ffff82c4802b7468 ffff82c4801d810f
> > (XEN)    ffff82c4802b74e0 000000000001ee61 ffff830201aa9e90 ffff82c4802b75bc
> > (XEN)    00000000002167f5 ffff88001ee61900 ffff82c4802b7518 ffff82c480211b80
> > (XEN)    ffff8302167f5000 ffff82c4801c168c 0000000000000000 ffff83003ea5c000
> > (XEN)    ffff88001ee61900 0000000001805063 0000000001809063 000000001ee001e3
> > (XEN)    000000001ee61067 00000000002167f5 000000000022ee70 000000000022ed10
> > (XEN)    ffffffffffffffff 0000000a00000007 0000000000000004 ffff82c48025db80
> > (XEN)    ffff83003ea5c000 ffff82c4802b75bc ffff88001ee61900 ffff830201aa9e90
> > (XEN)    ffff82c4802b7528 ffff82c480211cb1 ffff82c4802b7568 ffff82c4801da97f
> > (XEN)    ffff82c4801be053 0000000000000008 ffff82c4802b7b58 ffff88001ee61900
> > (XEN)    0000000000000000 ffff82c4802b78b0 ffff82c4802b75f8 ffff82c4801aaec8
> > (XEN)    0000000000000003 ffff88001ee61900 ffff82c4802b78b0 ffff82c4802b7640
> > (XEN)    ffff83003ea5c000 00000000000000a0 0000000000000900 0000000000000008
> > (XEN)    00000003802b7650 0000000000000004 00000003802b7668 0000000000000000
> > (XEN)    ffff82c4802b7b58 0000000000000001 0000000000000003 ffff82c4802b78b0
> > (XEN) Xen call trace:
> > (XEN)    [<ffff82c48012b85e>] prepare_to_wait+0x178/0x1b2
> > (XEN)    [<ffff82c4801e02f9>] ept_get_entry+0x81/0xd8
> > (XEN)    [<ffff82c4801d810f>] gfn_to_mfn_type_p2m+0x55/0x114
> > (XEN)    [<ffff82c480211b80>] hap_p2m_ga_to_gfn_4_levels+0x1c4/0x2d6
> > (XEN)    [<ffff82c480211cb1>] hap_gva_to_gfn_4_levels+0x1f/0x2e
> > (XEN)    [<ffff82c4801da97f>] paging_gva_to_gfn+0xae/0xc4
> > (XEN)    [<ffff82c4801aaec8>] hvmemul_linear_to_phys+0xf1/0x25c
> > (XEN)    [<ffff82c4801ab762>] hvmemul_rep_movs+0xe8/0x31a
> > (XEN)    [<ffff82c48018de07>] x86_emulate+0x4e01/0x10fde
> > (XEN)    [<ffff82c4801aab3c>] hvm_emulate_one+0x12d/0x1c5
> > (XEN)    [<ffff82c4801b68a9>] handle_mmio+0x4e/0x1d8
> > (XEN)    [<ffff82c4801b3a1e>] hvm_hap_nested_page_fault+0x1e7/0x302
> > (XEN)    [<ffff82c4801d1ff6>] vmx_vmexit_handler+0x12cf/0x1594
> > (XEN)
> > (XEN) wake 1ee61 20
> > 
> > 
> > 
> 
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.