[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: MSI-X cleanup(?) issue with passthrough after domU restart
On Tue, Aug 26, 2025 at 08:16:56AM +0200, Jan Beulich wrote: > On 26.08.2025 03:49, Marek Marczykowski-Górecki wrote: > > Hi, > > > > I'm hitting an MSI-X issue after rebooting the domU. The symptoms are > > rather boring: on initial domU start the device (realtek eth card) works > > fine, but after domU restart, the link doesn't come up (there is no > > "Link is Up" message anymore). No errors from domU driver or Xen. I > > tracked it down to MSI-X - if I force INTx (via pci=nomsi on domU > > cmdline) it works fine. Convincing the driver to poll instead of waiting > > for an interrupt also workarounds the issue. > > > > I noticed also some interrupts are not cleaned up on restart. The list > > of MSIs in 'Q' debug key output grows: > > > > (XEN) 0000:03:00.0 - d22 - node -1 - MSIs < 41 42 43 44 45 46 47 > > > restart sys-net domU > > (XEN) 0000:03:00.0 - d24 - node -1 - MSIs < 41 42 43 44 45 46 47 48 > > > restart sys-net domU > > (XEN) 0000:03:00.0 - d26 - node -1 - MSIs < 41 42 43 44 45 46 47 48 49 > > > > > > > and 'M' output is: > > > > (XEN) MSI-X 41 vec=b1 lowest edge assert log lowest > > dest=00000001 mask=1/H /1 > > (XEN) MSI-X 42 vec=b9 lowest edge assert log lowest > > dest=00000004 mask=1/HG/1 > > (XEN) MSI-X 43 vec=c1 lowest edge assert log lowest > > dest=00000010 mask=1/HG/1 > > (XEN) MSI-X 44 vec=d9 lowest edge assert log lowest > > dest=00000001 mask=1/HG/1 > > (XEN) MSI-X 45 vec=e1 lowest edge assert log lowest > > dest=00000001 mask=1/HG/1 > > (XEN) MSI-X 46 vec=e9 lowest edge assert log lowest > > dest=00000040 mask=1/HG/1 > > (XEN) MSI-X 47 vec=32 lowest edge assert log lowest > > dest=00000004 mask=1/HG/1 > > (XEN) MSI-X 48 vec=3a lowest edge assert log lowest > > dest=00000040 mask=1/HG/1 > > (XEN) MSI-X 49 vec=42 lowest edge assert log lowest > > dest=00000010 mask=1/ G/1 > > > > And also, after starting and stopping the domU, `xl pci-assignable-remove > > 03:00.0` > > makes pciback to complain: > > > > [ 1180.919874] pciback 0000:03:00.0: xen_pciback: MSI-X release failed > > (-16) > > > > This is all running on Xen 4.19.3, but I don't see much changes in this > > area since then. > > > > Some more info collected at > > https://github.com/QubesOS/qubes-issues/issues/9335 > > > > My question is: what should be responsible for this cleanup on domain > > destroy? Xen, or maybe device model (which is QEMU in stubdomain here)? > > The expectation is that qemu invokes the necessary cleanup, but of course ... > > > I see some cleanup (apparently not enough) happening via QEMU when the > > domU driver is unloaded, but logically correct cleanup shouldn't depend > > on correct domU operation... > > ... Xen may not make itself dependent upon either DomU or QEMU. AFAICT free_domain_pirqs() called by arch_domain_destroy() should take care of unbinding and freeing pirqs (but obviously not in this case). Can you repeat the test with a debug=y hypervisor and post the resulting serial or dmesg here? Some of the errors on those paths are printed with dprintk() and won't be visible unless using a Xen debug build. > What I find puzzling (assuming I can take the quoted output plus your > annotations > verbatim) is that the device apparently uses multiple vectors, and we're > leaking > exactly one of them. Also, since reboot is generally nothing else than > shutdown > and immediate relaunch, is there a leak also after shutdown? I ask because it > might help to know which of the multiple vectors is leaked (first, last, > random). Can we maybe get the output of `lspci -vv` when the device is attached? Thanks, Roger.
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |