[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v3 3/3] PCI/MSI: Convert pci_msi_ignore_mask to per MSI domain flag
On Mon, Mar 24, 2025 at 08:18:01PM +0100, Roger Pau Monné wrote: > On Mon, Mar 24, 2025 at 07:58:14PM +0100, Daniel Gomez wrote: > > On Mon, Mar 24, 2025 at 06:51:54PM +0100, Roger Pau Monné wrote: > > > On Mon, Mar 24, 2025 at 03:29:46PM +0100, Daniel Gomez wrote: > > > > > > > > Hi, > > > > > > > > On Fri, Mar 21, 2025 at 09:00:09AM +0100, Jürgen Groß wrote: > > > > > On 20.03.25 22:07, Bjorn Helgaas wrote: > > > > > > On Wed, Feb 19, 2025 at 10:20:57AM +0100, Roger Pau Monne wrote: > > > > > > > Setting pci_msi_ignore_mask inhibits the toggling of the mask bit > > > > > > > for both > > > > > > > MSI and MSI-X entries globally, regardless of the IRQ chip they > > > > > > > are using. > > > > > > > Only Xen sets the pci_msi_ignore_mask when routing physical > > > > > > > interrupts over > > > > > > > event channels, to prevent PCI code from attempting to toggle the > > > > > > > maskbit, > > > > > > > as it's Xen that controls the bit. > > > > > > > > > > > > > > However, the pci_msi_ignore_mask being global will affect devices > > > > > > > that use > > > > > > > MSI interrupts but are not routing those interrupts over event > > > > > > > channels > > > > > > > (not using the Xen pIRQ chip). One example is devices behind a > > > > > > > VMD PCI > > > > > > > bridge. In that scenario the VMD bridge configures MSI(-X) using > > > > > > > the > > > > > > > normal IRQ chip (the pIRQ one in the Xen case), and devices > > > > > > > behind the > > > > > > > bridge configure the MSI entries using indexes into the VMD > > > > > > > bridge MSI > > > > > > > table. The VMD bridge then demultiplexes such interrupts and > > > > > > > delivers to > > > > > > > the destination device(s). Having pci_msi_ignore_mask set in > > > > > > > that scenario > > > > > > > prevents (un)masking of MSI entries for devices behind the VMD > > > > > > > bridge. > > > > > > > > > > > > > > Move the signaling of no entry masking into the MSI domain flags, > > > > > > > as that > > > > > > > allows setting it on a per-domain basis. Set it for the Xen MSI > > > > > > > domain > > > > > > > that uses the pIRQ chip, while leaving it unset for the rest of > > > > > > > the > > > > > > > cases. > > > > > > > > > > > > > > Remove pci_msi_ignore_mask at once, since it was only used by Xen > > > > > > > code, and > > > > > > > with Xen dropping usage the variable is unneeded. > > > > > > > > > > > > > > This fixes using devices behind a VMD bridge on Xen PV hardware > > > > > > > domains. > > > > > > > > > > > > > > Albeit Devices behind a VMD bridge are not known to Xen, that > > > > > > > doesn't mean > > > > > > > Linux cannot use them. By inhibiting the usage of > > > > > > > VMD_FEAT_CAN_BYPASS_MSI_REMAP and the removal of the > > > > > > > pci_msi_ignore_mask > > > > > > > bodge devices behind a VMD bridge do work fine when use from a > > > > > > > Linux Xen > > > > > > > hardware domain. That's the whole point of the series. > > > > > > > > > > > > > > Signed-off-by: Roger Pau Monné <roger.pau@xxxxxxxxxx> > > > > > > > Reviewed-by: Thomas Gleixner <tglx@xxxxxxxxxxxxx> > > > > > > > Acked-by: Juergen Gross <jgross@xxxxxxxx> > > > > > > > > > > > > Acked-by: Bjorn Helgaas <bhelgaas@xxxxxxxxxx> > > > > > > > > > > > > I assume you'll merge this series via the Xen tree. Let me know if > > > > > > otherwise. > > > > > > > > > > I've pushed the series to the linux-next branch of the Xen tree. > > > > > > > > > > > > > > > Juergen > > > > > > > > This patch landed in latest next-20250324 tag causing this crash: > > > > > > > > [ 0.753426] BUG: kernel NULL pointer dereference, address: > > > > 0000000000000002 > > > > [ 0.753921] #PF: supervisor read access in kernel mode > > > > [ 0.754286] #PF: error_code(0x0000) - not-present page > > > > [ 0.754656] PGD 0 P4D 0 > > > > [ 0.754842] Oops: Oops: 0000 [#1] > > > > [ 0.755080] CPU: 0 UID: 0 PID: 1 Comm: swapper Not tainted > > > > 6.14.0-rc7-next-20250324 #1 NONE > > > > [ 0.755691] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS > > > > 1.16.3-debian-1.16.3-2 04/01/2014 > > > > [ 0.756349] RIP: 0010:msix_prepare_msi_desc+0x39/0x80 > > > > [ 0.756390] Code: 20 c7 46 04 01 00 00 00 8b 56 4c 89 d0 0d 01 01 00 > > > > 00 66 89 46 4c 8b 8f 64 02 00 00 89 4e 50 48 8b 8f 70 06 00 00 48 89 4e > > > > 58 <41> f6 40 02 40 75 2a c1 ea 02 bf 80 00 00 00 21 fa 25 7f ff ff ff > > > > [ 0.756390] RSP: 0000:ffff8881002a76e0 EFLAGS: 00010202 > > > > [ 0.756390] RAX: 0000000000000101 RBX: ffff88810074d000 RCX: > > > > ffffc9000002e000 > > > > [ 0.756390] RDX: 0000000000000000 RSI: ffff8881002a7710 RDI: > > > > ffff88810074d000 > > > > [ 0.756390] RBP: ffff8881002a7710 R08: 0000000000000000 R09: > > > > ffff8881002a76b4 > > > > [ 0.756390] R10: 000000701000c001 R11: ffffffff82a3dc01 R12: > > > > 0000000000000000 > > > > [ 0.756390] R13: 0000000000000005 R14: 0000000000000000 R15: > > > > 0000000000000002 > > > > [ 0.756390] FS: 0000000000000000(0000) GS:0000000000000000(0000) > > > > knlGS:0000000000000000 > > > > [ 0.756390] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > > > [ 0.756390] CR2: 0000000000000002 CR3: 0000000002a3d001 CR4: > > > > 00000000003706b0 > > > > [ 0.756390] Call Trace: > > > > [ 0.756390] <TASK> > > > > [ 0.756390] ? __die_body+0x1b/0x60 > > > > [ 0.756390] ? page_fault_oops+0x2d0/0x310 > > > > [ 0.756390] ? exc_page_fault+0x59/0xc0 > > > > [ 0.756390] ? asm_exc_page_fault+0x22/0x30 > > > > [ 0.756390] ? msix_prepare_msi_desc+0x39/0x80 > > > > [ 0.756390] ? msix_capability_init+0x172/0x2c0 > > > > [ 0.756390] ? __pci_enable_msix_range+0x1a8/0x1d0 > > > > [ 0.756390] ? pci_alloc_irq_vectors_affinity+0x7c/0xf0 > > > > [ 0.756390] ? vp_find_vqs_msix+0x187/0x400 > > > > [ 0.756390] ? vp_find_vqs+0x2f/0x250 > > > > [ 0.756390] ? snprintf+0x3e/0x50 > > > > [ 0.756390] ? vp_modern_find_vqs+0x13/0x60 > > > > [ 0.756390] ? init_vq+0x184/0x1e0 > > > > [ 0.756390] ? vp_get_status+0x20/0x20 > > > > [ 0.756390] ? virtblk_probe+0xeb/0x8d0 > > > > [ 0.756390] ? __kernfs_new_node+0x122/0x160 > > > > [ 0.756390] ? vp_get_status+0x20/0x20 > > > > [ 0.756390] ? virtio_dev_probe+0x171/0x1c0 > > > > [ 0.756390] ? really_probe+0xc2/0x240 > > > > [ 0.756390] ? driver_probe_device+0x1d/0x70 > > > > [ 0.756390] ? __driver_attach+0x96/0xe0 > > > > [ 0.756390] ? driver_attach+0x20/0x20 > > > > [ 0.756390] ? bus_for_each_dev+0x7b/0xb0 > > > > [ 0.756390] ? bus_add_driver+0xe6/0x200 > > > > [ 0.756390] ? driver_register+0x5e/0xf0 > > > > [ 0.756390] ? virtio_blk_init+0x4d/0x90 > > > > [ 0.756390] ? add_boot_memory_block+0x90/0x90 > > > > [ 0.756390] ? do_one_initcall+0xe2/0x250 > > > > [ 0.756390] ? xas_store+0x4b/0x4b0 > > > > [ 0.756390] ? number+0x13b/0x260 > > > > [ 0.756390] ? ida_alloc_range+0x36a/0x3b0 > > > > [ 0.756390] ? parameq+0x13/0x90 > > > > [ 0.756390] ? parse_args+0x10f/0x2a0 > > > > [ 0.756390] ? do_initcall_level+0x83/0xb0 > > > > [ 0.756390] ? do_initcalls+0x43/0x70 > > > > [ 0.756390] ? rest_init+0x80/0x80 > > > > [ 0.756390] ? kernel_init_freeable+0x70/0xb0 > > > > [ 0.756390] ? kernel_init+0x16/0x110 > > > > [ 0.756390] ? ret_from_fork+0x30/0x40 > > > > [ 0.756390] ? rest_init+0x80/0x80 > > > > [ 0.756390] ? ret_from_fork_asm+0x11/0x20 > > > > [ 0.756390] </TASK> > > > > [ 0.756390] Modules linked in: > > > > [ 0.756390] CR2: 0000000000000002 > > > > [ 0.756390] ---[ end trace 0000000000000000 ]--- > > > > [ 0.756390] RIP: 0010:msix_prepare_msi_desc+0x39/0x80 > > > > [ 0.756390] Code: 20 c7 46 04 01 00 00 00 8b 56 4c 89 d0 0d 01 01 00 > > > > 00 66 89 46 4c 8b 8f 64 02 00 00 89 4e 50 48 8b 8f 70 06 00 00 48 89 4e > > > > 58 <41> f6 40 02 40 75 2a c1 ea 02 bf 80 00 00 00 21 fa 25 7f ff ff ff > > > > [ 0.756390] RSP: 0000:ffff8881002a76e0 EFLAGS: 00010202 > > > > [ 0.756390] RAX: 0000000000000101 RBX: ffff88810074d000 RCX: > > > > ffffc9000002e000 > > > > [ 0.756390] RDX: 0000000000000000 RSI: ffff8881002a7710 RDI: > > > > ffff88810074d000 > > > > [ 0.756390] RBP: ffff8881002a7710 R08: 0000000000000000 R09: > > > > ffff8881002a76b4 > > > > [ 0.756390] R10: 000000701000c001 R11: ffffffff82a3dc01 R12: > > > > 0000000000000000 > > > > [ 0.756390] R13: 0000000000000005 R14: 0000000000000000 R15: > > > > 0000000000000002 > > > > [ 0.756390] FS: 0000000000000000(0000) GS:0000000000000000(0000) > > > > knlGS:0000000000000000 > > > > [ 0.756390] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > > > [ 0.756390] CR2: 0000000000000002 CR3: 0000000002a3d001 CR4: > > > > 00000000003706b0 > > > > [ 0.756390] note: swapper[1] exited with irqs disabled > > > > [ 0.782774] Kernel panic - not syncing: Attempted to kill init! > > > > exitcode=0x00000009 > > > > [ 0.783560] Kernel Offset: disabled > > > > [ 0.783909] ---[ end Kernel panic - not syncing: Attempted to kill > > > > init! exitcode=0x00000009 ]--- > > > > > > > > > > > > msix_prepare_msi_desc+0x39/0x80: > > > > msix_prepare_msi_desc at drivers/pci/msi/msi.c:616 > > > > 611 desc->nvec_used = 1; > > > > 612 desc->pci.msi_attrib.is_msix = 1; > > > > 613 desc->pci.msi_attrib.is_64 = 1; > > > > 614 desc->pci.msi_attrib.default_irq = dev->irq; > > > > 615 desc->pci.mask_base = > > > > dev->msix_base; > > > > >616< desc->pci.msi_attrib.can_mask = !(info->flags > > > > >& MSI_FLAG_NO_MASK) && > > > > 617 > > > > !desc->pci.msi_attrib.is_virtual; > > > > 618 > > > > 619 if (desc->pci.msi_attrib.can_mask) { > > > > 620 void __iomem *addr = pci_msix_desc_addr(desc); > > > > 621 > > > > > > > > Reverting patch 3 fixes the issue. > > > > > > Thanks for the report and sorry for the breakage. Do you have a QEMU > > > command line I can use to try to reproduce this locally? > > > > > > Will work on a patch ASAP. > > > > Thanks for the quick reply. > > > > The issue is that info appears to be uninitialized. So, this worked for me: > > Indeed, irq_domain->host_data is NULL, there's no msi_domain_info. As > this is x86, I was expecting x86 ot always use > x86_init_dev_msi_info(), but that doesn't seem to be the case. I > would like to better understand this. > > > diff --git a/drivers/pci/msi/msi.c b/drivers/pci/msi/msi.c > > index dcbb4f9ac578..b76c7ec33602 100644 > > --- a/drivers/pci/msi/msi.c > > +++ b/drivers/pci/msi/msi.c > > @@ -609,8 +609,10 @@ void msix_prepare_msi_desc(struct pci_dev *dev, struct > > msi_desc *desc) > > desc->pci.msi_attrib.is_64 = 1; > > desc->pci.msi_attrib.default_irq = dev->irq; > > desc->pci.mask_base = dev->msix_base; > > - desc->pci.msi_attrib.can_mask = !(info->flags & > > MSI_FLAG_NO_MASK) && > > - > > !desc->pci.msi_attrib.is_virtual; > > + desc->pci.msi_attrib.can_mask = > > + info ? !(info->flags & MSI_FLAG_NO_MASK) && > > + !desc->pci.msi_attrib.is_virtual : > > + 1; > > > > if (desc->pci.msi_attrib.can_mask) { > > void __iomem *addr = pci_msix_desc_addr(desc); > > @@ -743,7 +745,7 @@ static int msix_capability_init(struct pci_dev *dev, > > struct msix_entry *entries, > > /* Disable INTX */ > > pci_intx_for_msi(dev, 0); > > > > - if (!(info->flags & MSI_FLAG_NO_MASK)) { > > + if (info && !(info->flags & MSI_FLAG_NO_MASK)) { > > I think this should rather be: > > if (!info || !(info->flags & MSI_FLAG_NO_MASK)) { > > So that in case of no info the default action is to mask the entries. > > > /* > > * Ensure that all table entries are masked to prevent > > * stale entries from firing in a crash kernel. > > > > I also noticed d (struct irq_domain) can return NULL if > > CONFIG_GENERIC_MSI_IRQ > > is not set and we are not checking that either. > > > > I run QEMU with vmctl [1]. This is my command: > > > > [1] https://github.com/SamsungDS/vmctl > > > > /usr/bin/qemu-system-x86_64 \ > > -nodefaults \ > > -display "none" \ > > -machine "q35,accel=kvm,kernel-irqchip=split" \ > > -cpu "host" \ > > -smp "4" \ > > -m "8G" \ > > -device "intel-iommu,intremap=on" \ > > -netdev "user,id=net0,hostfwd=tcp::2222-:22" \ > > -device "virtio-net-pci,netdev=net0" \ > > -device "virtio-rng-pci" \ > > -drive > > "id=boot,file=file.qcow2,format=qcow2,if=virtio,discard=unmap,media=disk,read-only=no" > > \ > > -device "pcie-root-port,id=pcie_root_port0,chassis=1,slot=0" \ > > -device "nvme,id=nvme0,serial=deadbeef,bus=pcie_root_port0,mdts=7" \ > > -drive > > "id=nvm,file=~/nvm.img,format=raw,if=none,discard=unmap,media=disk,read-only=no" > > \ > > -device > > "nvme-ns,id=nvm,drive=nvm,bus=nvme0,nsid=1,logical_block_size=4096,physical_block_size=4096" > > \ > > -pidfile "~/vmctl/confdir/run/nvme/pidfile" \ > > -kernel "~/src/kernel/linux/arch/x86_64/boot/bzImage" \ > > -append "root=/dev/vda1 console=ttyS0,115200 audit=0" \ > > -virtfs > > "local,path=~/linux,security_model=none,readonly=on,mount_tag=kernel_dir" \ > > -serial "mon:stdio" \ > > -d "guest_errors" \ > > -D "~/vmctl/confdir/log/nvme/qemu.log" > > Can you narrow down the command line to the minimum required to > reproduce the issue? /usr/bin/qemu-system-x86_64 \ -nodefaults \ -display "none" \ -machine "q35,accel=kvm" \ -cpu "host" \ -drive "id=boot,file=file.qcow2,format=qcow2,if=virtio,discard=unmap,media=disk,read-only=no" \ -kernel "~/src/kernel/linux/arch/x86_64/boot/bzImage" \ -append "root=/dev/vda1 console=ttyS0,115200 audit=0" \ -serial "mon:stdio" > > Can you attach the Kconfig used to build the crashing kernel? I'm using these fragments [1]: tinyconfig kvm_guest.config virtio-fs.config systemd.config distro.config \ storage.config localauto.config [1] https://github.com/dkruces/linux-config-fragments/ > > Thanks, Roger.
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |