Xen project Mailing List

Re: [Xen-devel] Deadlock in stdvga_mem_accept() with emulation

To: Andrew Cooper <Andrew.Cooper3@xxxxxxxxxx>, Razvan Cojocaru <rcojocaru@xxxxxxxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxx>

From: Paul Durrant <Paul.Durrant@xxxxxxxxxx>

Date: Mon, 13 Jul 2015 08:51:12 +0000

Accept-language: en-GB, en-US

Cc: "Keir $Xen.org$" <keir@xxxxxxx>, Jan Beulich <JBeulich@xxxxxxxx>

Delivery-date: Mon, 13 Jul 2015 08:51:19 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

Thread-index: AQHQvUN9AKU4Xv/j4EOuAr7dVYbwwp3ZFKmA

Thread-topic: Deadlock in stdvga_mem_accept() with emulation

> -----Original Message----- > From: Andrew Cooper [mailto:amc96@xxxxxxxxxxxxxxxx] On Behalf Of > Andrew Cooper > Sent: 13 July 2015 09:11 > To: Razvan Cojocaru; xen-devel@xxxxxxxxxxxxx > Cc: Keir (Xen.org); Jan Beulich; Paul Durrant > Subject: Re: Deadlock in stdvga_mem_accept() with emulation > > On 13/07/2015 08:48, Razvan Cojocaru wrote: > > Hello, > > > > I'm battling the following hypervisor crash with current staging: > > > > (d2) Invoking ROMBIOS ... > > (XEN) stdvga.c:147:d2v0 entering stdvga and caching modes > > (d2) VGABios $Id: vgabios.c,v 1.67 2008/01/27 09:44:12 vruppert Exp $ > > (XEN) Watchdog timer detects that CPU7 is stuck! > > (XEN) ----[ Xen-4.6-unstable x86_64 debug=y Not tainted ]---- > > (XEN) CPU: 7 > > (XEN) RIP: e008:[<ffff82d08012c3f1>] _spin_lock+0x31/0x54 > > (XEN) RFLAGS: 0000000000000202 CONTEXT: hypervisor (d2v0) > > (XEN) rax: 000000000000c11d rbx: ffff83041e687970 rcx: > 000000000000c11e > > (XEN) rdx: ffff83041e687970 rsi: 000000000000c11e rdi: ffff83041e687978 > > (XEN) rbp: ffff83040eb37208 rsp: ffff83040eb37200 r8: 0000000000000000 > > (XEN) r9: 0000000000000000 r10: ffff82d08028c3c0 r11: 0000000000000000 > > (XEN) r12: ffff83041e687000 r13: ffff83041e687970 r14: ffff83040eb37278 > > (XEN) r15: 00000000000c253f cr0: 000000008005003b cr4: > 00000000001526e0 > > (XEN) cr3: 00000004054a0000 cr2: 0000000000000000 > > (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: 0000 cs: e008 > > (XEN) Xen stack trace from rsp=ffff83040eb37200: > > (XEN) ffff83040eb37278 ffff83040eb37238 ffff82d0801d09b6 > 0000000000000282 > > (XEN) 0000000000000008 ffff830403791bf0 ffff83041e687000 > ffff83040eb37268 > > (XEN) ffff82d0801cb23a 00000000000c253f ffff8300d85fc000 > 0000000000000001 > > (XEN) 00000000000000c2 ffff83040eb37298 ffff82d0801cb410 > 00000000000c253f > > (XEN) 0000000000000000 0000000100000001 0100000000000000 > ffff83040eb37328 > > (XEN) ffff82d0801c2403 ffff83040eb37394 ffff83040eb30000 > 0000000000000000 > > (XEN) ffff83040eb37360 00000000000000c2 ffff8304054cb000 > 000000000000053f > > (XEN) 0000000000000002 0000000000000000 ffff83040eb373f4 > 00000000000000c2 > > (XEN) ffff83040eb373d8 0000000000000000 0000000000000000 > ffff82d08028c620 > > (XEN) 0000000000000000 ffff83040eb37338 ffff82d0801c3e5d > ffff83040eb37398 > > (XEN) ffff82d0801cb107 000000010eb37394 ffff830403791bf0 > ffff830403791bf0 > > (XEN) ffff83041e687000 ffff83040eb37398 ffff830403791bf0 > 0000000000000001 > > (XEN) ffff83040eb373d8 0000000000000001 00000000000c253f > ffff83040eb373c8 > > (XEN) ffff82d0801cb291 ffff83040eb37b30 ffff8300d85fc000 > 0000000000000001 > > (XEN) 0000000000000000 ffff83040eb37428 ffff82d0801bb440 > 00000000000a0001 > > (XEN) 00000000000c253f 0000000100000001 0111000000000000 > ffff83040eb37478 > > (XEN) 0000000000000001 0000000000000000 0000000000000000 > 0000000000000001 > > (XEN) 0000000000000001 ffff83040eb374a8 ffff82d0801bc0b9 > 0000000000000001 > > (XEN) 00000000000c253f ffff8300d85fc000 00000000000a0001 > 0100000000000000 > > (XEN) ffff83040eb37728 ffff82e00819dc60 0000000000000000 > ffff83040eb374c8 > > (XEN) Xen call trace: > > (XEN) [<ffff82d08012c3f1>] _spin_lock+0x31/0x54 > > (XEN) [<ffff82d0801d09b6>] stdvga_mem_accept+0x3b/0x125 > > (XEN) [<ffff82d0801cb23a>] hvm_find_io_handler+0x68/0x8a > > (XEN) [<ffff82d0801cb410>] hvm_mmio_internal+0x37/0x67 > > (XEN) [<ffff82d0801c2403>] __hvm_copy+0xe9/0x37d > > (XEN) [<ffff82d0801c3e5d>] hvm_copy_from_guest_phys+0x14/0x16 > > (XEN) [<ffff82d0801cb107>] hvm_process_io_intercept+0x10b/0x1d6 > > (XEN) [<ffff82d0801cb291>] hvm_io_intercept+0x35/0x5b > > (XEN) [<ffff82d0801bb440>] hvmemul_do_io+0x1ff/0x2c1 > > (XEN) [<ffff82d0801bc0b9>] hvmemul_do_io_addr+0x117/0x163 > > (XEN) [<ffff82d0801bc129>] hvmemul_do_mmio_addr+0x24/0x26 > > (XEN) [<ffff82d0801bcbb5>] hvmemul_rep_movs+0x1ef/0x335 > > (XEN) [<ffff82d080198b49>] x86_emulate+0x56c9/0x13088 > > (XEN) [<ffff82d0801bbd26>] _hvm_emulate_one+0x186/0x281 > > (XEN) [<ffff82d0801bc1e8>] hvm_emulate_one+0x10/0x12 > > (XEN) [<ffff82d0801cb63e>] handle_mmio+0x54/0xd2 > > (XEN) [<ffff82d0801cb700>] handle_mmio_with_translation+0x44/0x46 > > (XEN) [<ffff82d0801c27f6>] hvm_hap_nested_page_fault+0x15f/0x589 > > (XEN) [<ffff82d0801e9741>] vmx_vmexit_handler+0x150e/0x188d > > (XEN) [<ffff82d0801ee7d1>] vmx_asm_vmexit_handler+0x41/0xc0 > > (XEN) > > (XEN) > > (XEN) **************************************** > > (XEN) Panic on CPU 7: > > (XEN) FATAL TRAP: vector = 2 (nmi) > > (XEN) [error_code=0000] > > (XEN) **************************************** > > > > At first I thought it was caused by V5 of the vm_event-based > > introspection series, but I've rolled it back enough to apply V4 on top > > of it (which has been thoroughly tested on Thursday), and it still > > happens, so this would at least appear to be unrelated at this point > > (other than the fact that our use case is maybe somewhat unusual with > > heavy emulation). > > > > I'll keep digging, but since this is a busy time for Xen I thought I'd > > issue a heads-up here as soon as possible, in case the problem is > > obvious for somebody and it helps getting it fixed sooner. > > In c/s 3bbaaec09b1b942f5624dee176da6e416d31f982 there is now a > deliberate split between stdvga_mem_accept() and > stdvga_mem_complete() > about locking and unlocking the stdvga lock. > > At a guess, the previous chain of execution accidentally omitted the > stdvga_mem_complete() call. > I can't see a way for that to happen. The only time the accept() call is made is in hvm_find_io_handler(). That function is only called from hvm_io_intercept() or hvm_mmio_internal(). In either case the handler's complete() call is made before returning. I think the problem here is indicated by the stack: > > (XEN) [<ffff82d0801d09b6>] stdvga_mem_accept+0x3b/0x125 > > (XEN) [<ffff82d0801cb23a>] hvm_find_io_handler+0x68/0x8a > > (XEN) [<ffff82d0801cb410>] hvm_mmio_internal+0x37/0x67 > > (XEN) [<ffff82d0801c2403>] __hvm_copy+0xe9/0x37d > > (XEN) [<ffff82d0801c3e5d>] hvm_copy_from_guest_phys+0x14/0x16 > > (XEN) [<ffff82d0801cb107>] hvm_process_io_intercept+0x10b/0x1d6 > > (XEN) [<ffff82d0801cb291>] hvm_io_intercept+0x35/0x5b > > (XEN) [<ffff82d0801bb440>] hvmemul_do_io+0x1ff/0x2c1 What seems to be happening is that the loop in hvm_process_io_intercept() is testing for mmio space (in this case VRAM), which is then re-entering the I/O code (via a check for an internal I/O range) and trying to re-acquire the stdvga lock. I am not sure why that would be happening, since the check in hvmemul_do_io_addr() makes sure that the target of the copy_to/from_guest_phys is most definitely RAM. Paul > ~Andrew _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.