[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: xen_pciback: error enabling MSI-X / MSI for guest -- WAS: Re: Kernel panic when passing through 2 identical PCI devices



On Monday, 23 June 2025 09:55:46 CEST Jan Beulich wrote:
> On 21.06.2025 16:39, J. Roeleveld wrote:
> > I managed to get past the kernel panic (sort of) by doing the following:
> > 
> > 1) Ensure system is fully OFF before booting. A reset/reboot will cause
> > these errors.
> > 
> > 2) Fix the BIOS config to ensure the PCI-ports are split correctly. If
> > anyone has a Supermicro board and gets errors about PCI-slots not getting
> > full speed let me know.
> > 
> > Not entirely convinced the 2nd was part of the cause, but that's ok.
> > 
> > I now, however, get a new error message in the Domain0 dmesg:
> > pciback <pci-address>: xen_map irq failed -28 for <domid> domain
> > pciback <pci-address>: error enabling MSI-X for guest <domid>: err -28!
> > 
> > For the NVMe devices, I get these twice, with the 2nd time complaining
> > about MSI (without the -X)
> > 
> > I feel there is something missing in my kernel-config and/or domain
> > config.
> > If anyone can point me at what needs to be enabled/disabled or suggestions
> > on what I can try?
> 
> The default number of extra IRQs the guest may (have) set up may be too
> small. You may need to make use of Xen's extra_guest_irqs= command line
> option.

I spent the entire weekend searching for possible causes/hints/things to try.
That setting was one I had found some time ago (I think for MSI/MSI-X issues) 
and it's currently set to:
extra_guest_irqs=768,1024

Not sure if it makes sense to increase this further?

# For completeness, the Xen commandline is:
dom0_mem=24576M,max:24576M dom0_max_vcpus=4 dom0_vcpus_pin 
gnttab_max_frames=512 sched=credit console=vga extra_guest_irqs=768,1024 
iommu=verbose

# The kernel commandline is:
kernel=gentoo-6.12.21.efi dozfs root=ZFS=zhost/host/root by=id elevator=noop 
logo.nologo triggers=zfs quiet refresh softlevel=prexen nomodeset 
nfs.callback_tcpport=32764 lockd.nlm_udpport=32768 lockd.nlm_tcpport=32768 
xen-pciback.hide=(83:00.0)(84:00.0)(85:00.0)(86:00.0) xen-
pciback.passthrough=1

If there is anything I am missing or should be doing differently, please let me 
know.

As said, I spent the entire weekend search with google and duckduckgo (ddg 
seems to return more relevant results, but also has few results with similar 
error message that are more recent then 6+ years). Here are what I found out 
so far:

== 1) NVMe errors in dmesg:

I noticed that, even when working, the NVMe drivers show issues with "MSI-X" )
(only showing 1 of the 2, the other has the same messages):
[    7.742006] nvme nvme0: pci function 0000:84:00.0
[    7.742158] nvme 0000:84:00.0: Xen PCI mapped GSI56 to IRQ59
[    7.752907] nvme nvme0: D3 entry latency set to 8 seconds
[    8.003806] nvme nvme0: allocated 64 MiB host memory buffer.
[    8.038746] nvme 0000:84:00.0: enable msix get err ffffff8e
[    8.038756] nvme 0000:84:00.0: Xen PCI frontend error: -114!
[    8.048849] nvme nvme0: 1/0/0 default/read/poll queues
[    8.106017]  nvme0n1: p1 p2 p3 p4

I have been unable to find what " enable msix get err ffffff8e " and " Xen PCI 
frontend error: -114! " actually mean.

These messages show up on a "working" environment. This is with kernel 6.12.21 
and Xen 4.18.4_pre1

== 2) A BIOS setting where default differs from general recommendation by 
Supermicro:
- Setting " MMIO High Size " is set to 256GB. Recommendation I see is to set 
this to 1024GB.
Note: The server has 368 GB ram, so does make sense. I will be changing this 
setting during my next chance to do further testing.

== 3) IOMMU seems enabled, apart from 2 items:
(XEN) Intel VT-d iommu 0 supported page sizes: 4kB, 2MB, 1GB
(XEN) Intel VT-d iommu 1 supported page sizes: 4kB, 2MB, 1GB
(XEN) Intel VT-d Snoop Control enabled.
(XEN) Intel VT-d Dom0 DMA Passthrough not enabled. <---
(XEN) Intel VT-d Queued Invalidation enabled.
(XEN) Intel VT-d Interrupt Remapping enabled.
(XEN) Intel VT-d Posted Interrupt not enabled.  <---
(XEN) Intel VT-d Shared EPT tables enabled.
(XEN) I/O virtualisation enabled
(XEN)  - Dom0 mode: Relaxed
(XEN) Interrupt remapping enabled

The 2 marked lines say "not enabled", if I understand all the different 
documentation correctly, this is not an issue. Please let me know if I am 
mistaken.

== 4) "nr_irqs" (and this is making me wonder if the "extra_guest_irqs" is 
actually used

In the Dmesg on the host I see:
[    2.328651] NR_IRQS: 8448, nr_irqs: 1024, preallocated irqs: 16

On the VM/Domain I see:
[    3.673555] NR_IRQS: 4352, nr_irqs: 80, preallocated irqs: 0

The number on the host matches.
The number in the Domain does not.

The specific domain is always the 2nd that is started.





 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.