Xen project Mailing List

Re: Kernel panic when passing through 2 identical PCI devices

To: xen-devel@xxxxxxxxxxxxxxxxxxxx, Jan Beulich <jbeulich@xxxxxxxx>

From: "J. Roeleveld" <joost@xxxxxxxxxxxx>

Date: Mon, 02 Jun 2025 16:37:36 +0200

Delivery-date: Mon, 02 Jun 2025 14:37:48 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Monday, 2 June 2025 16:31:11 CEST Jan Beulich wrote: > On 02.06.2025 16:19, J. Roeleveld wrote: > > On Monday, 2 June 2025 15:43:37 CEST you wrote: > >> On 02.06.2025 14:28, J. Roeleveld wrote: > >>> I have a domain to which I pass through 4 PCI devices: > >>> 2 NVMe drives > >>> 83:00.0 Samsung 980 NVMe > >>> 84:00.0 Samsung 980 NVMe > >>> > >>> 2 HBA Controllers > >>> 86:00.0 LSI SAS3008 > >>> 87:00.0 LSI SAS3008 > >>> > >>> This works fine with Xen version 4.18.4_pre1. > >>> However, when trying to update to 4.19, this fails. > >> > >> To make it explicit: The domain in question is a PV one. > > > > Yes. I tried to convert it to PVH in the past, but PCI-passthrough wasn't > > working at all. And nothing I found since shows that it should be working > > now.> > >>> Checking the output during boot, I think I found something. But my > >>> knowledge is insufficient to figure out what is causing what I am seeing > >>> and how to fix this. > >>> > >>> From the below (where I only focus on the 2 NVMe drives), it is similar > >>> to > >>> the succesfull boot up until it tries to "claiming resource > >>> 0000:84:00.0/0". At which point sysfs fails because the entry for "84" > >>> is > >>> already present. > >> > >> What would be interesting is to know why / how this 2nd registration > >> happens. > > > > Only guess I can make: They are both the same brand/model/size. Only > > serial number differs > > I don't think this matters here at all. The guest isn't at the point yet > where it would even be able to retrieve these. From the log you provided > it's the PCI subsystem where the issue is triggered. This goes beyond my knowledge. Which means I'd rather provide too much information then too little :) > >> It's the same (guest) kernel version afaics, so something must > >> behave differently on the host. Are you sure the sole (host side) > >> difference is the hypervisor version? I.e. the Dom0 kernel version is the > >> same in the failing and successful cases? I ask because there's very > >> little > >> Xen itself does that would play into pass-through device discovery / > >> resource setup by a (PV) guest (which doesn't mean Xen can't screw things > >> up). The more relevant component is the xen-pciback driver in Dom0. > > > > I can confirm it's dependent on the Xen version. > > Kernel version = 6.12.21 > > I get a succesful boot with Xen version 4.18.4_pre1. > > When I use Xen version 4.19.1, the boot fails due to this issue. > > > > The kernel and initramfs does not differ between the boot. > > And that's the Dom0 kernel, just to clarify? There are two kernels involved > here, after all. Yes. Dom0 and the guest have their own kernel images. However, both run the same version. (I compile kernels from source) > >> Sadly the log provided does, to me at least, not have enough data to draw > >> conclusions. Some instrumenting of the guest kernel may be necessary ... > > > > The host boots using UEFI: > > > > === (xen.cfg in the EFI partition) === > > [global] > > default=xen > > > > [xen] > > options=dom0_mem=24576M,max:24576M dom0_max_vcpus=4 dom0_vcpus_pin > > gnttab_max_frames=512 sched=credit console=vga extra_guest_irqs=768,1024 > > > > kernel=gentoo-6.12.21.efi dozfs root=ZFS=zhost/host/root by=id > > elevator=noop logo.nologo triggers=zfs quiet refresh softlevel=prexen > > nomodeset > > nfs.callback_tcpport=32764 lockd.nlm_udpport=32768 lockd.nlm_tcpport=32768 > > xen-pciback.hide=(83:00.0)(84:00.0)(86:00.0)(87:00.0) xen- > > pciback.passthrough=1 > > > > ramdisk=initramfs-6.12.21-gentoo-host.img > > === > > > > Please let me know what other information you need and if there is > > anything I can try/test to get more information. > > Does the mailing list allow gzipped text files as attachment? Or how would > > you prefer the kernel-config of the host and guest? > > I don't think these are relevant (for the moment). Ok. > > If there are tests to do, please give me several to try as I need to > > schedule downtime for reboots. > > That would be some kernel hacking, as indicated before: Instrument the > (guest) kernel enough to figure out where the 1st and 2nd sysfs > registrations come from. This may then give us a clue what's being driven > the wrong way (by Xen, or maybe by the toolstack). If you could point me to a guide on how to do this? I know enough about C/C++ to write my own tools. But the kernel and Xen is too complex for me to follow and I would not even know where to begin. -- Joost

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.