[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: NetBSD dom0 PVH: hardware interrupts stalls
On 18.11.2020 15:39, Roger Pau Monné wrote: > On Wed, Nov 18, 2020 at 01:14:03PM +0100, Manuel Bouyer wrote: >> I did some more instrumentation from the NetBSD kernel, including dumping >> the iopic2 pin2 register. >> >> At the time of the command timeout, the register value is 0x0000a067, >> which, if I understant it properly, menas that there's no interrupt >> pending (bit IOAPIC_REDLO_RIRR, 0x00004000, is not set). >> From the NetBSD ddb, I can dump this register multiple times, waiting >> several seconds, etc .., it doens't change). >> Now if I call ioapic_dump_raw() from the debugger, which triggers some >> XEN printf: >> db{0}> call ioapic_dump_raw^M >> Register dump of ioapic0^M >> [ 203.5489060] 00 08000000 00170011 08000000(XEN) vioapic.c:124:d0v0 >> apic_mem_re >> adl:undefined ioregsel 3 >> 00000000(XEN) vioapic.c:124:d0v0 apic_mem_readl:undefined ioregsel 4 >> 00000000(XEN) vioapic.c:124:d0v0 apic_mem_readl:undefined ioregsel 5 >> 00000000(XEN) vioapic.c:124:d0v0 apic_mem_readl:undefined ioregsel 6 >> 00000000(XEN) vioapic.c:124:d0v0 apic_mem_readl:undefined ioregsel 7 >> 00000000^M >> [ 203.5489060] 08(XEN) vioapic.c:124:d0v0 apic_mem_readl:undefined ioregsel 8 >> 00000000(XEN) vioapic.c:124:d0v0 apic_mem_readl:undefined ioregsel 9 >> 00000000(XEN) vioapic.c:124:d0v0 apic_mem_readl:undefined ioregsel a >> 00000000(XEN) vioapic.c:124:d0v0 apic_mem_readl:undefined ioregsel b >> 00000000(XEN) vioapic.c:124:d0v0 apic_mem_readl:undefined ioregsel c >> 00000000(XEN) vioapic.c:124:d0v0 apic_mem_readl:undefined ioregsel d >> 00000000(XEN) vioapic.c:124:d0v0 apic_mem_readl:undefined ioregsel e >> 00000000(XEN) vioapic.c:124:d0v0 apic_mem_readl:undefined ioregsel f >> 00000000^M >> [ 203.5489060] 10 00010000 00000000 00010000 00000000 00010000 00000000 >> 00010000 00000000^M >> [...] >> [ 203.5489060] Register dump of ioapic2^M >> [ 203.5489060] 00 0a000000 00070011 0a000000(XEN) vioapic.c:124:d0v0 >> apic_mem_readl:undefined ioregsel 3 >> 00000000(XEN) vioapic.c:124:d0v0 apic_mem_readl:undefined ioregsel 4 >> 00000000(XEN) vioapic.c:124:d0v0 apic_mem_readl:undefined ioregsel 5 >> 00000000(XEN) vioapic.c:124:d0v0 apic_mem_readl:undefined ioregsel 6 >> 00000000(XEN) vioapic.c:124:d0v0 apic_mem_readl:undefined ioregsel 7 >> 00000000^M >> [ 203.5489060] 08(XEN) vioapic.c:124:d0v0 apic_mem_readl:undefined ioregsel 8 >> 00000000(XEN) vioapic.c:124:d0v0 apic_mem_readl:undefined ioregsel 9 >> 00000000(XEN) vioapic.c:124:d0v0 apic_mem_readl:undefined ioregsel a >> 00000000(XEN) vioapic.c:124:d0v0 apic_mem_readl:undefined ioregsel b >> 00000000(XEN) vioapic.c:124:d0v0 apic_mem_readl:undefined ioregsel c >> 00000000(XEN) vioapic.c:124:d0v0 apic_mem_readl:undefined ioregsel d >> 00000000(XEN) vioapic.c:124:d0v0 apic_mem_readl:undefined ioregsel e >> 00000000(XEN) vioapic.c:124:d0v0 apic_mem_readl:undefined ioregsel f >> 00000000^M >> [ 203.5489060] 10 00010000 00000000 00010000 00000000 0000e067 00000000 >> 00010000 00000000^M >> >> then the register switches to 0000e067, with the IOAPIC_REDLO_RIRR bit set. >> From here, if I continue from ddb, the dom0 boots. >> >> I can get the same effect by just doing ^A^A^A so my guess is that it's >> not accessing the iopic's register which changes the IOAPIC_REDLO_RIRR bit, >> but the XEN printf. Also, from NetBSD, using a dump fuinction which >> doesn't access undefined registers - and so doesn't trigger XEN printfs - >> doens't change the IOAPIC_REDLO_RIRR bit either. > > I'm thinking about further ways to debug this. I see that all active > IO-APIC pins are routed to vCPU0, but does it make a difference if you > boot with dom0_max_vcpus=1 on the Xen command line? (thus limiting > NertBSD dom0 to a single CPU) I too have been pondering possible approaches. One thing I thought might help is accompany all places setting remote_irr (and calling vioapic_deliver()) with a conditional log message, turning on the condition immediately before the first "undefined ioregsel" gets logged. (And turn it off again once the last RTE was read in sequence, just to avoid spamming the console.) From Manuel's description above, there has to be something that sets the bit and causes the delivery _without_ any active action by the guest (i.e. neither EOI nor RTE write) and _without_ any new instance of the IRQ appearing. I have some vague hope that knowing how we end up making the system make progress again may also help understand how it got stuck. Jan
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |