[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: MSI-X cleanup(?) issue with passthrough after domU restart
On Tue, Aug 26, 2025 at 10:28:56AM +0200, Roger Pau Monné wrote: > On Tue, Aug 26, 2025 at 08:16:56AM +0200, Jan Beulich wrote: > > On 26.08.2025 03:49, Marek Marczykowski-Górecki wrote: > > > Hi, > > > > > > I'm hitting an MSI-X issue after rebooting the domU. The symptoms are > > > rather boring: on initial domU start the device (realtek eth card) works > > > fine, but after domU restart, the link doesn't come up (there is no > > > "Link is Up" message anymore). No errors from domU driver or Xen. I > > > tracked it down to MSI-X - if I force INTx (via pci=nomsi on domU > > > cmdline) it works fine. Convincing the driver to poll instead of waiting > > > for an interrupt also workarounds the issue. > > > > > > I noticed also some interrupts are not cleaned up on restart. The list > > > of MSIs in 'Q' debug key output grows: > > > > > > (XEN) 0000:03:00.0 - d22 - node -1 - MSIs < 41 42 43 44 45 46 47 > > > > restart sys-net domU > > > (XEN) 0000:03:00.0 - d24 - node -1 - MSIs < 41 42 43 44 45 46 47 48 > > > > restart sys-net domU > > > (XEN) 0000:03:00.0 - d26 - node -1 - MSIs < 41 42 43 44 45 46 47 48 > > > 49 > > > > > > > and 'M' output is: > > > > > > (XEN) MSI-X 41 vec=b1 lowest edge assert log lowest > > > dest=00000001 mask=1/H /1 > > > (XEN) MSI-X 42 vec=b9 lowest edge assert log lowest > > > dest=00000004 mask=1/HG/1 > > > (XEN) MSI-X 43 vec=c1 lowest edge assert log lowest > > > dest=00000010 mask=1/HG/1 > > > (XEN) MSI-X 44 vec=d9 lowest edge assert log lowest > > > dest=00000001 mask=1/HG/1 > > > (XEN) MSI-X 45 vec=e1 lowest edge assert log lowest > > > dest=00000001 mask=1/HG/1 > > > (XEN) MSI-X 46 vec=e9 lowest edge assert log lowest > > > dest=00000040 mask=1/HG/1 > > > (XEN) MSI-X 47 vec=32 lowest edge assert log lowest > > > dest=00000004 mask=1/HG/1 > > > (XEN) MSI-X 48 vec=3a lowest edge assert log lowest > > > dest=00000040 mask=1/HG/1 > > > (XEN) MSI-X 49 vec=42 lowest edge assert log lowest > > > dest=00000010 mask=1/ G/1 > > > > > > And also, after starting and stopping the domU, `xl pci-assignable-remove > > > 03:00.0` > > > makes pciback to complain: > > > > > > [ 1180.919874] pciback 0000:03:00.0: xen_pciback: MSI-X release > > > failed (-16) > > > > > > This is all running on Xen 4.19.3, but I don't see much changes in this > > > area since then. > > > > > > Some more info collected at > > > https://github.com/QubesOS/qubes-issues/issues/9335 > > > > > > My question is: what should be responsible for this cleanup on domain > > > destroy? Xen, or maybe device model (which is QEMU in stubdomain here)? > > > > The expectation is that qemu invokes the necessary cleanup, but of course > > ... > > > > > I see some cleanup (apparently not enough) happening via QEMU when the > > > domU driver is unloaded, but logically correct cleanup shouldn't depend > > > on correct domU operation... > > > > ... Xen may not make itself dependent upon either DomU or QEMU. > > AFAICT free_domain_pirqs() called by arch_domain_destroy() should take > care of unbinding and freeing pirqs (but obviously not in this case). > Can you repeat the test with a debug=y hypervisor and post the > resulting serial or dmesg here? Some of the errors on those paths are > printed with dprintk() and won't be visible unless using a Xen debug > build. Sure, will do. > > What I find puzzling (assuming I can take the quoted output plus your > > annotations > > verbatim) is that the device apparently uses multiple vectors, No, that was not the first domU restart before I started collecting this output. At fresh boot there is just one vector. > > and we're leaking > > exactly one of them. Also, since reboot is generally nothing else than > > shutdown > > and immediate relaunch, is there a leak also after shutdown? I ask because > > it > > might help to know which of the multiple vectors is leaked (first, last, > > random). > > Can we maybe get the output of `lspci -vv` when the device is > attached? Both below on first domU start, when the device still works, but when it breaks it's identical. Collected in dom0: 03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8211/8411 PCI Express Gigabit Ethernet Controller (rev 06) Subsystem: Gigabyte Technology Co., Ltd Onboard Ethernet Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 18 Region 0: I/O ports at e000 [size=256] Region 2: Memory at f7c00000 (64-bit, non-prefetchable) [size=4K] Region 4: Memory at f0000000 (64-bit, prefetchable) [size=16K] Capabilities: [40] Power Management version 3 Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA PME(D0+,D1+,D2+,D3hot+,D3cold+) Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME- Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+ Address: 0000000000000000 Data: 0000 Capabilities: [70] Express (v2) Endpoint, IntMsgNum 1 DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 10W TEE-IO- DevCtl: CorrErr+ NonFatalErr+ FatalErr+ UnsupReq+ RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- MaxPayload 128 bytes, MaxReadReq 4096 bytes DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend- LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s unlimited, L1 <64us ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp- LnkCtl: ASPM Disabled; RCB 64 bytes, LnkDisable- CommClk+ ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- FltModeDis- LnkSta: Speed 2.5GT/s, Width x1 TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- DevCap2: Completion Timeout: Range ABCD, TimeoutDis+ NROPrPrP- LTR- 10BitTagComp- 10BitTagReq- OBFF Not Supported, ExtFmt- EETLPPrefix- EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit- FRS- TPHComp- ExtTPHComp- AtomicOpsCap: 32bit- 64bit- 128bitCAS- DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- AtomicOpsCtl: ReqEn- IDOReq- IDOCompl- LTR- EmergencyPowerReductionReq- 10BitTagReq- OBFF Disabled, EETLPPrefixBlk- LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis- Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- Compliance Preset/De-emphasis: -6dB de-emphasis, 0dB preshoot LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete- EqualizationPhase1- EqualizationPhase2- EqualizationPhase3- LinkEqualizationRequest- Retimer- 2Retimers- CrosslinkRes: unsupported, FltMode- Capabilities: [b0] MSI-X: Enable+ Count=4 Masked- Vector table: BAR=4 offset=00000000 PBA: BAR=4 offset=00000800 Capabilities: [d0] Vital Product Data Not readable Capabilities: [100 v1] Advanced Error Reporting UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UncorrIntErr- BlockedTLP- AtomicOpBlocked- TLPBlockedErr- PoisonTLPBlocked- DMWrReqBlocked- IDECheck- MisIDETLP- PCRC_CHECK- TLPXlatBlocked- UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UncorrIntErr- BlockedTLP- AtomicOpBlocked- TLPBlockedErr- PoisonTLPBlocked- DMWrReqBlocked- IDECheck- MisIDETLP- PCRC_CHECK- TLPXlatBlocked- UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol- UncorrIntErr- BlockedTLP- AtomicOpBlocked- TLPBlockedErr- PoisonTLPBlocked- DMWrReqBlocked- IDECheck- MisIDETLP- PCRC_CHECK- TLPXlatBlocked- CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr- CorrIntErr- HeaderOF- CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+ CorrIntErr- HeaderOF- AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn- MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap- HeaderLog: 00000000 00000000 00000000 00000000 Capabilities: [140 v1] Virtual Channel Caps: LPEVC=0 RefClk=100ns PATEntryBits=1 Arb: Fixed- WRR32- WRR64- WRR128- Ctrl: ArbSelect=Fixed Status: InProgress- VC0: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans- Arb: Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256- Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=01 Status: NegoPending- InProgress- Capabilities: [160 v1] Device Serial Number 01-00-00-00-68-4c-e0-00 Kernel driver in use: pciback Kernel modules: r8169 and the domU view: 00:06.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8211/8411 PCI Express Gigabit Ethernet Controller (rev 06) Subsystem: Gigabyte Technology Co., Ltd Onboard Ethernet Physical Slot: 6 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 40 Region 0: I/O ports at c200 [size=256] Region 2: Memory at f2018000 (64-bit, non-prefetchable) [size=4K] Region 4: Memory at f2010000 (64-bit, prefetchable) [size=16K] Capabilities: [40] Power Management version 3 Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME- Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+ Address: 0000000000000000 Data: 0000 Capabilities: [70] Express (v2) Endpoint, IntMsgNum 1 DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 10W TEE-IO- DevCtl: CorrErr+ NonFatalErr+ FatalErr+ UnsupReq+ RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- MaxPayload 128 bytes, MaxReadReq 4096 bytes DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend- LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s unlimited, L1 <64us ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp- LnkCtl: ASPM Disabled; RCB 64 bytes, LnkDisable- CommClk+ ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- FltModeDis- LnkSta: Speed 2.5GT/s, Width x1 TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- DevCap2: Completion Timeout: Range ABCD, TimeoutDis+ NROPrPrP- LTR- 10BitTagComp- 10BitTagReq- OBFF Not Supported, ExtFmt- EETLPPrefix- EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit- FRS- TPHComp- ExtTPHComp- AtomicOpsCap: 32bit- 64bit- 128bitCAS- DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- AtomicOpsCtl: ReqEn- IDOReq- IDOCompl- LTR- EmergencyPowerReductionReq- 10BitTagReq- OBFF Disabled, EETLPPrefixBlk- LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete- EqualizationPhase1- EqualizationPhase2- EqualizationPhase3- LinkEqualizationRequest- Retimer- 2Retimers- CrosslinkRes: unsupported, FltMode- Capabilities: [b0] MSI-X: Enable+ Count=4 Masked- Vector table: BAR=4 offset=00000000 PBA: BAR=4 offset=00000800 Capabilities: [d0] Vital Product Data Not readable Kernel driver in use: r8169 Kernel modules: r8169 -- Best Regards, Marek Marczykowski-Górecki Invisible Things Lab Attachment:
signature.asc
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |