[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: Event delivery and "domain blocking" on PVHv2
On Fri, Jun 19, 2020 at 06:41:21PM +0200, Martin Lucina wrote: > On 2020-06-19 13:21, Roger Pau Monné wrote: > > On Fri, Jun 19, 2020 at 12:28:50PM +0200, Martin Lucina wrote: > > > On 2020-06-18 13:46, Roger Pau Monné wrote: > > > > On Thu, Jun 18, 2020 at 12:13:30PM +0200, Martin Lucina wrote: > > > > > At this point I don't really have a clear idea of how to progress, > > > > > comparing my implementation side-by-side with the original PV > > > > > Mini-OS-based > > > > > implementation doesn't show up any differences I can see. > > > > > > > > > > AFAICT the OCaml code I've also not changed in any material way, and > > > > > that > > > > > has been running in production on PV for years, so I'd be inclined > > > > > to think > > > > > the problem is in my reimplementation of the C parts, but where...? > > > > > > > > A good start would be to print the ISR and IRR lapic registers when > > > > blocked, to assert there are no pending vectors there. > > > > > > > > Can you apply the following patch to your Xen, rebuild and check the > > > > output of the 'l' debug key? > > > > > > > > Also add the output of the 'v' key. > > > > > > Had to fight the Xen Debian packages a bit as I wanted to patch the > > > exact > > > same Xen (there are some failures when building on a system that has > > > Xen > > > installed due to following symlinks when fixing shebangs). > > > > > > Here you go, when stuck during netfront setup, after allocating its > > > event > > > channel, presumably waiting on Xenstore: > > > > > > 'e': > > > > > > (XEN) Event channel information for domain 3: > > > (XEN) Polling vCPUs: {} > > > (XEN) port [p/m/s] > > > (XEN) 1 [1/0/1]: s=3 n=0 x=0 d=0 p=33 > > > (XEN) 2 [1/1/1]: s=3 n=0 x=0 d=0 p=34 > > > (XEN) 3 [1/0/1]: s=5 n=0 x=0 v=0 > > > (XEN) 4 [0/1/1]: s=2 n=0 x=0 d=0 > > > > > > 'l': > > > > > > (XEN) d3v0 IRR: > > > ffff8301732dc200b > > > (XEN) d3v0 ISR: > > > ffff8301732dc100b > > > > Which version of Xen is this? AFAICT it doesn't have the support to > > print a bitmap. > > That in Debian 10 (stable): > > ii xen-hypervisor-4.11-amd64 4.11.3+24-g14b62ab3e5-1~deb10u1.2 > amd64 Xen Hypervisor on AMD64 > > xen_major : 4 > xen_minor : 11 > xen_extra : .4-pre > xen_version : 4.11.4-pre > > > > > Do you think you could also pick commit > > 8cd9500958d818e3deabdd0d4164ea6fe1623d7c [0] and rebuild? (and print > > the info again). > > Done, here you go: > > (XEN) Event channel information for domain 3: > (XEN) Polling vCPUs: {} > (XEN) port [p/m/s] > (XEN) 1 [1/0/1]: s=3 n=0 x=0 d=0 p=33 > (XEN) 2 [1/1/1]: s=3 n=0 x=0 d=0 p=34 > (XEN) 3 [1/0/1]: s=5 n=0 x=0 v=0 > (XEN) 4 [0/1/1]: s=3 n=0 x=0 d=0 p=35 > > > (XEN) d3v0 IRR: > 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 > (XEN) d3v0 ISR: > 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 So there's nothing pending on the lapic. Can you assert that you will always execute evtchn_demux_pending after you have received an event channel interrupt (ie: executed solo5__xen_evtchn_vector_handler)? I think this would be simpler if you moved evtchn_demux_pending into solo5__xen_evtchn_vector_handler? As there would be less asynchronous processing, and thus likely less races? Roger.
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |