[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [BUG] Emulation issues



> -----Original Message-----
> From: Roger Pau Monnà [mailto:roger.pau@xxxxxxxxxx]
> Sent: 29 July 2015 14:54
> To: Paul Durrant; xen-devel; Andrew Cooper
> Subject: Re: [BUG] Emulation issues
> 
> El 29/07/15 a les 14.41, Paul Durrant ha escrit:
> >> -----Original Message-----
> >> From: Roger Pau Monnà [mailto:roger.pau@xxxxxxxxxx]
> >> Sent: 29 July 2015 11:37
> >> To: Paul Durrant; xen-devel; Andrew Cooper
> >> Subject: Re: [BUG] Emulation issues
> >>
> >> El 29/07/15 a les 12.27, Paul Durrant ha escrit:
> >>>> -----Original Message-----
> >>>> From: Roger Pau Monnà [mailto:roger.pau@xxxxxxxxxx]
> >>>> Sent: 29 July 2015 11:17
> >>>> To: xen-devel; Andrew Cooper; Paul Durrant
> >>>> Subject: [BUG] Emulation issues
> >>>>
> >>>> Hello,
> >>>>
> >>>> While trying to debug a hotplug scripts issue, I came across what seems
> >>>> to be an emulation bug inside of Xen. The result of this is a bunch of
> >>>> repeated messages on the serial console:
> >>>>
> >>>
> >>> Was there anything of interest before this? You got an 'unhandleable'
> >> emulation which generally should not happen, but I guess there may be a
> >> shutdown race in tearing down the ioreq server list and sending
> emulation
> >> requests which may cause hvm_send_ioreq() to return
> >> X86EMUL_UNHANDLEABLE. It would be good to better understand the
> >> sequence of events.
> >>
> >> I don't think there's anything relevant before the messages I've posted,
> >> here is a more complete log:
> >>
> >> (XEN) irq.c:386: Dom91 callback via changed to Direct Vector 0x93
> >> (XEN) irq.c:386: Dom92 callback via changed to Direct Vector 0x93
> >> (XEN) irq.c:276: Dom91 PCI link 0 changed 5 -> 0
> >> (XEN) irq.c:276: Dom91 PCI link 1 changed 10 -> 0
> >> (XEN) irq.c:276: Dom91 PCI link 2 changed 11 -> 0
> >> (XEN) irq.c:276: Dom91 PCI link 3 changed 5 -> 0
> >> (XEN) irq.c:276: Dom92 PCI link 0 changed 5 -> 0
> >> (XEN) irq.c:276: Dom92 PCI link 1 changed 10 -> 0
> >> (XEN) irq.c:276: Dom92 PCI link 2 changed 11 -> 0
> >> (XEN) irq.c:276: Dom92 PCI link 3 changed 5 -> 0
> >> INIT: Id "T0" respawning too fast: disabled for 5 minutes
> >> (XEN) io.c:165:d83v0 Weird HVM ioemulation status 1.
> >> (XEN) domain_crash called from io.c:166
> >> (XEN) io.c:165:d83v0 Weird HVM ioemulation status 1.
> >> (XEN) domain_crash called from io.c:166
> >> (XEN) io.c:165:d83v0 Weird HVM ioemulation status 1.
> >> (XEN) domain_crash called from io.c:166
> >> (XEN) io.c:165:d83v0 Weird HVM ioemulation status 1.
> >> (XEN) domain_crash called from io.c:166
> >> (XEN) io.c:165:d83v0 Weird HVM ioemulation status 1.
> >> (XEN) domain_crash called from io.c:166
> >> (XEN) io.c:165:d83v0 Weird HVM ioemulation status 1.
> >> (XEN) domain_crash called from io.c:166
> >> (XEN) io.c:165:d83v0 Weird HVM ioemulation status 1.
> >> (XEN) domain_crash called from io.c:166
> >> (XEN) io.c:165:d83v0 Weird HVM ioemulation status 1.
> >> (XEN) domain_crash called from io.c:166
> >> (XEN) io.c:165:d83v0 Weird HVM ioemulation status 1.
> >> (XEN) domain_crash called from io.c:166
> >> (XEN) io.c:165:d83v0 Weird HVM ioemulation status 1.
> >> (XEN) domain_crash called from io.c:166
> >>
> >> If you can provide a debug/trace patch I can run the same workload with
> >> it in order to trace the sequence of events.
> >>
> >
> > Could you try this?
> >
> > diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
> > index 30acb78..1bc3cc9 100644
> > --- a/xen/arch/x86/hvm/emulate.c
> > +++ b/xen/arch/x86/hvm/emulate.c
> > @@ -145,6 +145,8 @@ static int hvmemul_do_io(
> >              return X86EMUL_UNHANDLEABLE;
> >          goto finish_access;
> >      default:
> > +        gprintk(XENLOG_ERR, "weird emulation state %u\n",
> > +                vio->io_req.state);
> >          return X86EMUL_UNHANDLEABLE;
> >      }
> >
> > diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> > index ec1d797..38d6d99 100644
> > --- a/xen/arch/x86/hvm/hvm.c
> > +++ b/xen/arch/x86/hvm/hvm.c
> > @@ -2747,6 +2747,7 @@ int hvm_send_ioreq(struct hvm_ioreq_server *s,
> ioreq_t *pr
> >          }
> >      }
> >
> > +    gprintk(XENLOG_ERR, "unable to contact device model\n");
> >      return X86EMUL_UNHANDLEABLE;
> >  }
> 
> I've applied your patch and the one from Andrew, so my current diff is:
> 
> diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
> index 30acb78..1bc3cc9 100644
> --- a/xen/arch/x86/hvm/emulate.c
> +++ b/xen/arch/x86/hvm/emulate.c
> @@ -145,6 +145,8 @@ static int hvmemul_do_io(
>              return X86EMUL_UNHANDLEABLE;
>          goto finish_access;
>      default:
> +        gprintk(XENLOG_ERR, "weird emulation state %u\n",
> +                vio->io_req.state);
>          return X86EMUL_UNHANDLEABLE;
>      }
> 
> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> index ec1d797..38d6d99 100644
> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -2747,6 +2747,7 @@ int hvm_send_ioreq(struct hvm_ioreq_server *s,
> ioreq_t *proto_p,
>          }
>      }
> 
> +    gprintk(XENLOG_ERR, "unable to contact device model\n");
>      return X86EMUL_UNHANDLEABLE;
>  }
> 
> diff --git a/xen/arch/x86/hvm/io.c b/xen/arch/x86/hvm/io.c
> index d3b9cae..12d50c2 100644
> --- a/xen/arch/x86/hvm/io.c
> +++ b/xen/arch/x86/hvm/io.c
> @@ -163,7 +163,9 @@ int handle_pio(uint16_t port, unsigned int size, int dir)
>          break;
>      default:
>          gdprintk(XENLOG_ERR, "Weird HVM ioemulation status %d.\n", rc);
> -        domain_crash(curr->domain);
> +        show_execution_state(&curr->arch.user_regs);
> +        dump_execution_state();
> +        domain_crash_synchronous();
>          break;
>      }
> 
> And got the following panic while doing a `xl shutdown -w -a` of 20 HVM
> guests:
> 
> (XEN) irq.c:386: Dom19 callback via changed to Direct Vector 0x93
> (XEN) irq.c:276: Dom19 PCI link 0 changed 5 -> 0
> (XEN) irq.c:276: Dom19 PCI link 1 changed 10 -> 0
> (XEN) irq.c:276: Dom19 PCI link 2 changed 11 -> 0
> (XEN) irq.c:276: Dom19 PCI link 3 changed 5 -> 0
> (XEN) d10v0 weird emulation state 1
> (XEN) io.c:165:d10v0 Weird HVM ioemulation status 1.
> (XEN) Assertion 'diff < STACK_SIZE' failed at traps.c:91
> (XEN) ----[ Xen-4.6-unstable  x86_64  debug=y  Tainted:    C ]----
> (XEN) CPU:    0
> (XEN) RIP:    e008:[<ffff82d080234b83>] show_registers+0x60/0x32f
> (XEN) RFLAGS: 0000000000010212   CONTEXT: hypervisor (d10v0)
> (XEN) rax: 000000001348fc88   rbx: ffff8300cc668290   rcx: 0000000000000000
> (XEN) rdx: ffff8300dfaf0000   rsi: ffff8300cc668358   rdi: ffff8300dfaf7bb8
> (XEN) rbp: ffff8300dfaf7bd8   rsp: ffff8300dfaf7a98   r8:  ffff83019d270000
> (XEN) r9:  0000000000000004   r10: 0000000000000004   r11: 0000000000000001
> (XEN) r12: ffff8300cc668000   r13: 0000000000000000   r14: ffff82c00026c000
> (XEN) r15: ffff830198bf9000   cr0: 000000008005003b   cr4: 00000000000026e0
> (XEN) cr3: 00000000cc77b000   cr2: ffff880002762df8
> (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e010   cs: e008
> (XEN) Xen stack trace from rsp=ffff8300dfaf7a98:
> (XEN)    ffff8300dfaf7ac8 ffff82d080144b11 0000000000000046
> ffff8300dfaf7ac8
> (XEN)    0000000000000046 0000000000000092 ffff8300dfaf7ae0
> ffff82d08012cfd3
> (XEN)    ffff82d0802a1bc0 ffff8300dfaf7af8 0000000000000046
> 0000000000002001
> (XEN)    0000000000002001 fffff80002089e28 0000000000000001
> fffffe00003829c0
> (XEN)    000000000000b004 0000000000000000 0000000000000014
> 0000000000000002
> (XEN)    000000000000b004 0000000000002001 000000000000b005
> 000000000000b004
> (XEN)    0000000000002001 000000000000b004
> 0000beef0000beef<G><0>d15v0 weird emulation state 1
> (XEN)  ffffffff8036fa45<G><0>io.c:165:d15v0 Weird HVM ioemulation status
> 1.
> (XEN)
> (XEN)   Assertion 'diff < STACK_SIZE' failed at traps.c:91
> (XEN)  000000bf0000beef----[ Xen-4.6-unstable  x86_64  debug=y  Tainted:
> C ]----
> (XEN)  0000000000000046CPU:    6
> (XEN)  fffffe00003829c0RIP:    e008:[<ffff82d080234b83>] 000000000000beef
> show_registers+0x60/0x32f
> (XEN)
> (XEN) RFLAGS: 0000000000010212    0000000000000000CONTEXT: hypervisor
> 0000000000000000 (d15v0) 0000000000000000
> (XEN) rax: 0000000121dd3c88   rbx: ffff83007b4c4290   rcx: 0000000000000000
> (XEN)  0000000000000000rdx: ffff83019d290000   rsi: ffff83007b4c4358   rdi:
> ffff83019d297bb8
> (XEN)
> (XEN)   rbp: ffff83019d297bd8   rsp: ffff83019d297a98   r8:  ffff83019d270000
> (XEN)  ffff8300cc668290r9:  0000000000000001   r10: 0000000000000001   r11:
> 0000000000000001
> (XEN)  ffff8300cc668000r12: ffff83007b4c4000   r13: 0000000000000000   r14:
> ffff82c000299000
> (XEN)  0000000000000000r15: ffff830198bf9000   cr0: 000000008005003b   cr4:
> 00000000000026e0
> (XEN)  ffff82c00026c000cr3: 000000007b5d7000   cr2: ffff8800026b14d8
> (XEN)
> (XEN)   ds: 002b   es: 002b   fs: 0000   gs: 0000   ss: e010   cs: e008
> (XEN)  ffff8300dfaf7bf8Xen stack trace from rsp=ffff83019d297a98:
> (XEN)    ffff82d08018dd4d ffff82d0802685bf 0000000000000001
> ffff830198bf9000 0000000000000002 00007cfe62d68527
> (XEN)    ffff82d08023b132 ffff8300dfaf7c38
> (XEN)    ffff82d0801caff0 ffff830198bf9000 ffff8300dfaf7c38 ffff82d0802685bf
> 0000000000002001 ffff83019d297b70
> (XEN)    0000000000000200 ffff8300cc7da000
> (XEN)    ffff83019d29ecc0 ffff83019d297b98 ffff8300cc668000
> 0000000000000000 ffff8300cc7da250 0000000000000001
> (XEN)    0000000000002001 ffff8300dfaf7db8
> (XEN)    ffff82d0801c5934 0000000000002001 8000000000000000
> fffff80002089e28 ffff8300cc7da000 0000000000000001
> (XEN)    fffffe00003829c0 ffff8300dfaf0000
> (XEN)    ffff8300cc7da250 000000000000b004 ffff8300dfaf7cf8
> 0000000000000000 00000000000cc277 0000000000000014
> (XEN)    0000000000000002 0000000000000000
> (XEN)    0000000000000001 000000000000b004 00000000000feff0
> 0000000000002001 ffff8300ccfec820 000000000000b005
> (XEN)    000000000000b004 ffff8300dfaf7d08
> (XEN)    ffff82d0801f2009 0000000000002001 ffffffffffffffff 000000000000b004
> ffffffffffffffff 0000beef0000beef
> (XEN)    ffffffff8036fa45 00000000000001f0
> (XEN)    000000004003b000 000000bf0000beef ffff8300cc7da000
> 0000000000000046 0000000000000000 fffffe00003829c0
> (XEN)    000000000000beef ffff8300ccfec820
> (XEN)    00000000000cc278 0000000000000000 ffff8300ccfec820
> 0000000000000000 ffff8300cc7da000 0000000000000000
> (XEN)    0000000000000000 ffff8300dfaf7da8
> (XEN)    ffff82d080122c5a ffff83007b4c4290 ffff8300dfaf7db8
> ffff83007b4c4000 ffff8300dfaf7d28 0000000000000000
> (XEN)  ffff82c000299000Xen call trace:
> (XEN)
> (XEN)      [<ffff82d080234b83>] show_registers+0x60/0x32f
> (XEN)  ffff83019d297bf8   [<ffff82d08018dd4d>]
> show_execution_state+0x11/0x20
> (XEN)  ffff82d08018dd4d   [<ffff82d0801caff0>] handle_pio+0x129/0x158
> (XEN)  0000000000000001   [<ffff82d0801c5934>]
> hvm_do_resume+0x258/0x33e
> (XEN)  0000000000000002   [<ffff82d0801e3166>]
> vmx_do_resume+0x12b/0x142

Ok, so this is a handle_pio() that's being called to pick up the results of an 
I/O that was sent to an external emulator. Did you manage to apply my patch 
too? I'd really like to know what part of the emulation handling is actually 
returning X86EMUL_UNHANDLEABLE.

  Paul

> (XEN)
> (XEN)      [<ffff82d080164adc>] context_switch+0xf0c/0xf63
> (XEN)  ffff83019d297c38   [<ffff82d0801299e0>] schedule+0x5b9/0x612
> (XEN)  ffff82d0801caff0   [<ffff82d08012c765>] __do_softirq+0x82/0x8d
> (XEN)  ffff83019d297c38   [<ffff82d08012c7bd>] do_softirq+0x13/0x15
> (XEN)  0000000000002001   [<ffff82d08023ace1>] process_softirqs+0x21/0x30
> (XEN)
> (XEN)
> (XEN)  ffff83007b637000
> (XEN) ****************************************
> (XEN)  ffff83007b60aed0Panic on CPU 0:
> (XEN)  ffff83007b4c4000Assertion 'diff < STACK_SIZE' failed at traps.c:91
> (XEN)  ffff83007b637250****************************************
> (XEN)
> (XEN)
> (XEN)   Reboot in five seconds...
> (XEN)  ffff83019d297db8
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.