[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: xen_pciback: error enabling MSI-X / MSI for guest -- WAS: Re: Kernel panic when passing through 2 identical PCI devices
On Monday, 23 June 2025 09:55:46 CEST Jan Beulich wrote: > On 21.06.2025 16:39, J. Roeleveld wrote: > > I managed to get past the kernel panic (sort of) by doing the following: > > > > 1) Ensure system is fully OFF before booting. A reset/reboot will cause > > these errors. > > > > 2) Fix the BIOS config to ensure the PCI-ports are split correctly. If > > anyone has a Supermicro board and gets errors about PCI-slots not getting > > full speed let me know. > > > > Not entirely convinced the 2nd was part of the cause, but that's ok. > > > > I now, however, get a new error message in the Domain0 dmesg: > > pciback <pci-address>: xen_map irq failed -28 for <domid> domain > > pciback <pci-address>: error enabling MSI-X for guest <domid>: err -28! > > > > For the NVMe devices, I get these twice, with the 2nd time complaining > > about MSI (without the -X) > > > > I feel there is something missing in my kernel-config and/or domain > > config. > > If anyone can point me at what needs to be enabled/disabled or suggestions > > on what I can try? > > The default number of extra IRQs the guest may (have) set up may be too > small. You may need to make use of Xen's extra_guest_irqs= command line > option. I spent the entire weekend searching for possible causes/hints/things to try. That setting was one I had found some time ago (I think for MSI/MSI-X issues) and it's currently set to: extra_guest_irqs=768,1024 Not sure if it makes sense to increase this further? # For completeness, the Xen commandline is: dom0_mem=24576M,max:24576M dom0_max_vcpus=4 dom0_vcpus_pin gnttab_max_frames=512 sched=credit console=vga extra_guest_irqs=768,1024 iommu=verbose # The kernel commandline is: kernel=gentoo-6.12.21.efi dozfs root=ZFS=zhost/host/root by=id elevator=noop logo.nologo triggers=zfs quiet refresh softlevel=prexen nomodeset nfs.callback_tcpport=32764 lockd.nlm_udpport=32768 lockd.nlm_tcpport=32768 xen-pciback.hide=(83:00.0)(84:00.0)(85:00.0)(86:00.0) xen- pciback.passthrough=1 If there is anything I am missing or should be doing differently, please let me know. As said, I spent the entire weekend search with google and duckduckgo (ddg seems to return more relevant results, but also has few results with similar error message that are more recent then 6+ years). Here are what I found out so far: == 1) NVMe errors in dmesg: I noticed that, even when working, the NVMe drivers show issues with "MSI-X" ) (only showing 1 of the 2, the other has the same messages): [ 7.742006] nvme nvme0: pci function 0000:84:00.0 [ 7.742158] nvme 0000:84:00.0: Xen PCI mapped GSI56 to IRQ59 [ 7.752907] nvme nvme0: D3 entry latency set to 8 seconds [ 8.003806] nvme nvme0: allocated 64 MiB host memory buffer. [ 8.038746] nvme 0000:84:00.0: enable msix get err ffffff8e [ 8.038756] nvme 0000:84:00.0: Xen PCI frontend error: -114! [ 8.048849] nvme nvme0: 1/0/0 default/read/poll queues [ 8.106017] nvme0n1: p1 p2 p3 p4 I have been unable to find what " enable msix get err ffffff8e " and " Xen PCI frontend error: -114! " actually mean. These messages show up on a "working" environment. This is with kernel 6.12.21 and Xen 4.18.4_pre1 == 2) A BIOS setting where default differs from general recommendation by Supermicro: - Setting " MMIO High Size " is set to 256GB. Recommendation I see is to set this to 1024GB. Note: The server has 368 GB ram, so does make sense. I will be changing this setting during my next chance to do further testing. == 3) IOMMU seems enabled, apart from 2 items: (XEN) Intel VT-d iommu 0 supported page sizes: 4kB, 2MB, 1GB (XEN) Intel VT-d iommu 1 supported page sizes: 4kB, 2MB, 1GB (XEN) Intel VT-d Snoop Control enabled. (XEN) Intel VT-d Dom0 DMA Passthrough not enabled. <--- (XEN) Intel VT-d Queued Invalidation enabled. (XEN) Intel VT-d Interrupt Remapping enabled. (XEN) Intel VT-d Posted Interrupt not enabled. <--- (XEN) Intel VT-d Shared EPT tables enabled. (XEN) I/O virtualisation enabled (XEN) - Dom0 mode: Relaxed (XEN) Interrupt remapping enabled The 2 marked lines say "not enabled", if I understand all the different documentation correctly, this is not an issue. Please let me know if I am mistaken. == 4) "nr_irqs" (and this is making me wonder if the "extra_guest_irqs" is actually used In the Dmesg on the host I see: [ 2.328651] NR_IRQS: 8448, nr_irqs: 1024, preallocated irqs: 16 On the VM/Domain I see: [ 3.673555] NR_IRQS: 4352, nr_irqs: 80, preallocated irqs: 0 The number on the host matches. The number in the Domain does not. The specific domain is always the 2nd that is started.
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |