[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: MSI-X cleanup(?) issue with passthrough after domU restart



On Tue, Aug 26, 2025 at 08:16:56AM +0200, Jan Beulich wrote:
> On 26.08.2025 03:49, Marek Marczykowski-Górecki wrote:
> > Hi,
> > 
> > I'm hitting an MSI-X issue after rebooting the domU. The symptoms are
> > rather boring: on initial domU start the device (realtek eth card) works
> > fine, but after domU restart, the link doesn't come up (there is no
> > "Link is Up" message anymore). No errors from domU driver or Xen. I
> > tracked it down to MSI-X - if I force INTx (via pci=nomsi on domU
> > cmdline) it works fine. Convincing the driver to poll instead of waiting
> > for an interrupt also workarounds the issue.
> > 
> > I noticed also some interrupts are not cleaned up on restart. The list
> > of MSIs in 'Q' debug key output grows:
> > 
> >     (XEN) 0000:03:00.0 - d22 - node -1  - MSIs < 41 42 43 44 45 46 47 >
> >     restart sys-net domU
> >     (XEN) 0000:03:00.0 - d24 - node -1  - MSIs < 41 42 43 44 45 46 47 48 >
> >     restart sys-net domU
> >     (XEN) 0000:03:00.0 - d26 - node -1  - MSIs < 41 42 43 44 45 46 47 48 49 
> > >
> > 
> > and 'M' output is:
> > 
> >     (XEN)  MSI-X   41 vec=b1 lowest  edge   assert  log lowest 
> > dest=00000001 mask=1/H /1
> >     (XEN)  MSI-X   42 vec=b9 lowest  edge   assert  log lowest 
> > dest=00000004 mask=1/HG/1
> >     (XEN)  MSI-X   43 vec=c1 lowest  edge   assert  log lowest 
> > dest=00000010 mask=1/HG/1
> >     (XEN)  MSI-X   44 vec=d9 lowest  edge   assert  log lowest 
> > dest=00000001 mask=1/HG/1
> >     (XEN)  MSI-X   45 vec=e1 lowest  edge   assert  log lowest 
> > dest=00000001 mask=1/HG/1
> >     (XEN)  MSI-X   46 vec=e9 lowest  edge   assert  log lowest 
> > dest=00000040 mask=1/HG/1
> >     (XEN)  MSI-X   47 vec=32 lowest  edge   assert  log lowest 
> > dest=00000004 mask=1/HG/1
> >     (XEN)  MSI-X   48 vec=3a lowest  edge   assert  log lowest 
> > dest=00000040 mask=1/HG/1
> >     (XEN)  MSI-X   49 vec=42 lowest  edge   assert  log lowest 
> > dest=00000010 mask=1/ G/1
> > 
> > And also, after starting and stopping the domU, `xl pci-assignable-remove 
> > 03:00.0`
> > makes pciback to complain:
> > 
> >     [ 1180.919874] pciback 0000:03:00.0: xen_pciback: MSI-X release failed 
> > (-16)
> > 
> > This is all running on Xen 4.19.3, but I don't see much changes in this
> > area since then.
> > 
> > Some more info collected at 
> > https://github.com/QubesOS/qubes-issues/issues/9335
> > 
> > My question is: what should be responsible for this cleanup on domain
> > destroy? Xen, or maybe device model (which is QEMU in stubdomain here)?
> 
> The expectation is that qemu invokes the necessary cleanup, but of course ...
> 
> > I see some cleanup (apparently not enough) happening via QEMU when the
> > domU driver is unloaded, but logically correct cleanup shouldn't depend
> > on correct domU operation...
> 
> ... Xen may not make itself dependent upon either DomU or QEMU.

AFAICT free_domain_pirqs() called by arch_domain_destroy() should take
care of unbinding and freeing pirqs (but obviously not in this case).
Can you repeat the test with a debug=y hypervisor and post the
resulting serial or dmesg here?  Some of the errors on those paths are
printed with dprintk() and won't be visible unless using a Xen debug
build.

> What I find puzzling (assuming I can take the quoted output plus your 
> annotations
> verbatim) is that the device apparently uses multiple vectors, and we're 
> leaking
> exactly one of them. Also, since reboot is generally nothing else than 
> shutdown
> and immediate relaunch, is there a leak also after shutdown? I ask because it
> might help to know which of the multiple vectors is leaked (first, last, 
> random).

Can we maybe get the output of `lspci -vv` when the device is
attached?

Thanks, Roger.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.