[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: NetBSD dom0 PVH: hardware interrupts stalls
On Fri, Nov 20, 2020 at 09:09:51AM +0100, Jan Beulich wrote: > On 19.11.2020 18:57, Manuel Bouyer wrote: > > I added an ASSERT() after the printf to ket a stack trace, and got: > > db{0}> call ioapic_dump_raw^M > > Register dump of ioapic0^M > > [ 13.0193374] 00 08000000 00170011 08000000(XEN) vioapic.c:141:d0v0 > > apic_mem_readl:undefined ioregsel 3 > > (XEN) vioapic.c:512:vioapic_irq_positive_edge: vioapic_deliver 2 > > (XEN) Assertion '!print' failed at vioapic.c:512 > > (XEN) ----[ Xen-4.15-unstable x86_64 debug=y Tainted: C ]---- > > (XEN) CPU: 0 > > (XEN) RIP: e008:[<ffff82d0402c4164>] > > vioapic_irq_positive_edge+0x14e/0x150 > > (XEN) RFLAGS: 0000000000010202 CONTEXT: hypervisor (d0v0) > > (XEN) rax: ffff82d0405c806c rbx: ffff830836650580 rcx: 0000000000000000 > > (XEN) rdx: ffff8300688bffff rsi: 000000000000000a rdi: ffff82d0404b36b8 > > (XEN) rbp: ffff8300688bfde0 rsp: ffff8300688bfdc0 r8: 0000000000000004 > > (XEN) r9: 0000000000000032 r10: 0000000000000000 r11: 00000000fffffffd > > (XEN) r12: ffff8308366dc000 r13: 0000000000000022 r14: ffff8308366dc31c > > (XEN) r15: ffff8308366d1d80 cr0: 0000000080050033 cr4: 00000000003526e0 > > (XEN) cr3: 00000008366c9000 cr2: 0000000000000000 > > (XEN) fsb: 0000000000000000 gsb: 0000000000000000 gss: 0000000000000000 > > (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: 0000 cs: e008 > > (XEN) Xen code around <ffff82d0402c4164> > > (vioapic_irq_positive_edge+0x14e/0x150): > > (XEN) 3d 10 be 1d 00 00 74 c2 <0f> 0b 55 48 89 e5 41 57 41 56 41 55 41 54 > > 53 48 > > (XEN) Xen stack trace from rsp=ffff8300688bfdc0: > > (XEN) 0000000200000086 ffff8308366dc000 0000000000000022 0000000000000000 > > (XEN) ffff8300688bfe08 ffff82d0402bcc33 ffff8308366dc000 0000000000000022 > > (XEN) 0000000000000001 ffff8300688bfe40 ffff82d0402bd18f ffff830835a7eb98 > > (XEN) ffff8308366dc000 ffff830835a7eb40 ffff8300688bfe68 0100100100100100 > > (XEN) ffff8300688bfea0 ffff82d04026f6e1 ffff830835a7eb30 ffff8308366dc0f4 > > (XEN) ffff830835a7eb40 ffff8300688bfe68 ffff8300688bfe68 ffff82d0405cec80 > > (XEN) ffffffffffffffff ffff82d0405cec80 0000000000000000 ffff82d0405d6c80 > > (XEN) ffff8300688bfed8 ffff82d04022b6fa ffff83083663f000 ffff83083663f000 > > (XEN) 0000000000000000 0000000000000000 0000000a7c62165b ffff8300688bfee8 > > (XEN) ffff82d04022b798 ffff8300688bfe08 ffff82d0402a4bcb 0000000000000000 > > (XEN) 0000000000000206 ffff8316da86e61c ffff8316da86e600 ffff938031fd47c0 > > (XEN) 0000000000000003 0000000000000400 ff889e8da08f928a 0000000000000000 > > (XEN) 0000000000000002 0000000000000100 000000000000b86e ffff93803237f010 > > (XEN) 0000000000000000 ffff8316da86e61c 0000beef0000beef ffffffff80555918 > > (XEN) 000000bf0000beef 0000000000000046 ffff938031fd4790 000000000000beef > > (XEN) 000000000000beef 000000000000beef 000000000000beef 000000000000beef > > (XEN) 0000e01000000000 ffff83083663f000 0000000000000000 00000000003526e0 > > (XEN) 0000000000000000 0000000000000000 0000060100000001 0000000000000000 > > (XEN) Xen call trace: > > (XEN) [<ffff82d0402c4164>] R vioapic_irq_positive_edge+0x14e/0x150 > > (XEN) [<ffff82d0402bcc33>] F arch/x86/hvm/irq.c#assert_gsi+0x5e/0x7b > > (XEN) [<ffff82d0402bd18f>] F hvm_gsi_assert+0x62/0x77 > > (XEN) [<ffff82d04026f6e1>] F > > drivers/passthrough/io.c#dpci_softirq+0x261/0x29e > > (XEN) [<ffff82d04022b6fa>] F common/softirq.c#__do_softirq+0x8a/0xbf > > (XEN) [<ffff82d04022b798>] F do_softirq+0x13/0x15 > > (XEN) [<ffff82d0402a4bcb>] F vmx_asm_do_vmentry+0x2b/0x30 > > (XEN) > > (XEN) > > (XEN) **************************************** > > (XEN) Panic on CPU 0: > > (XEN) Assertion '!print' failed at vioapic.c:512 > > (XEN) **************************************** > > Right, this was the expected path after what you've sent prior to this. > Which turned my attention back to the 'i' debug key output you had sent > the other day. There we have > > (XEN) IRQ: 34 vec:51 IO-APIC-level status=010 aff:{0}/{0-7} in-flight=1 > d0: 34(-MM) > > i.e. at that point we're waiting for Dom0 to signal it's done handling > the IRQ. There is, however, a timer associated with this. Yet that's > actually to prevent the system getting stuck, i.e. the "in-flight" > state ought to clear 1ms later (when that timer expires), and hence > ought to be pretty unlikely to catch when non-zero _and_ something's > actually stuck. I somehow assumed the interrupt was in-flight because the printing to the Xen console caused one to be injected, and thus dom0 didn't had time to Ack it yet. > > So for the softirq to get Dom0 out of its stuck state, there has got to > be yet some other event. Nevertheless it may be worthwhile > instrumenting irq_guest_eoi_timer_fn() to prove we actually take this > path, i.e. Xen is trying to "clean up" after Dom0 taking too long to > service an IRQ. In normal operation this path shouldn't be taken, so I > wouldn't exclude something got broken in that logic. (Orthogonal to > this it may also be worth seeing whether increasing the timeout would > actually help things. This wouldn't be a solution, but another data > point hinting something's wrong on this code path.) > > Roger, I'm also somewhat puzzled by the trailing (-MM): Is PVH using > event channels for delivering pIRQ-s? No, it's always using emulated interrupt controllers. I explicitly disabled HVM PIRQ for PVH. > I thought that's purely vIO-APIC > and vMSI? I wonder whether we misleadingly dump info from evtchn 0 > here, in which case only the 2nd of the M-s would be meaningful (and > would be in line with non-zero in-flight). Likely - will have to look closer but there's no event channel associated with a PIRQ on PVH dom0. I will send a patch to fix dump_irqs. Maybe we should track interrupt EOI, and see when the interrupt gets EOI'ed. Will see if I can find some time later to prepare another debug patch. Roger.
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |