[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Kernel panic when passing through 2 identical PCI devices



On Monday, 2 June 2025 15:43:37 CEST you wrote:
On 02.06.2025 14:28, J. Roeleveld wrote:
> I have a domain to which I pass through 4 PCI devices:
>2 NVMe drives
>83:00.0   Samsung 980 NVMe
>84:00.0   Samsung 980 NVMe
>
>2 HBA Controllers
>86:00.0   LSI SAS3008
>87:00.0   LSI SAS3008
>
> This works fine with Xen version 4.18.4_pre1.
> However, when trying to update to 4.19, this fails.

To make it explicit: The domain in question is a PV one.

Yes. I tried to convert it to PVH in the past, but PCI-passthrough wasn't
working at all. And nothing I found since shows that it should be working now.

> Checking the output during boot, I think I found something. But my
> knowledge is insufficient to figure out what is causing what I am seeing
> and how to fix this.
>
> From the below (where I only focus on the 2 NVMe drives), it is similar to
> the succesfull boot up until it tries to "claiming resource
>0000:84:00.0/0". At which point sysfs fails because the entry for "84" is
> already present.
What would be interesting is to know why / how this 2nd registration
happens.

Only guess I can make: They are both the same brand/model/size. Only serial
number differs

It's the same (guest) kernel version afaics, so something must
behave differently on the host. Are you sure the sole (host side)
difference is the hypervisor version? I.e. the Dom0 kernel version is the
same in the failing and successful cases? I ask because there's very little
Xen itself does that would play into pass-through device discovery /
resource setup by a (PV) guest (which doesn't mean Xen can't screw things
up). The more relevant component is the xen-pciback driver in Dom0.

I can confirm it's dependent on the Xen version.
Kernel version  = 6.12.21
I get a succesful boot with Xen version 4.18.4_pre1.
When I use Xen version 4.19.1, the boot fails due to this issue.

The kernel and initramfs does not differ between the boot.

Sadly the log provided does, to me at least, not have enough data to draw
conclusions. Some instrumenting of the guest kernel may be necessary ...

The host boots using UEFI:

=== (xen.cfg in the EFI partition) ===
[global]
default=xen

[xen]
options=dom0_mem=24576M,max:24576M dom0_max_vcpus=4 dom0_vcpus_pin
gnttab_max_frames=512 sched=credit console=vga extra_guest_irqs=768,1024

kernel=gentoo-6.12.21.efi dozfs root=ZFS=zhost/host/root by=id elevator=noop
logo.nologo triggers=zfs quiet refresh softlevel=prexen nomodeset
nfs.callback_tcpport=32764 lockd.nlm_udpport=32768 lockd.nlm_tcpport=32768
xen-pciback.hide=(83:00.0)(84:00.0)(86:00.0)(87:00.0) xen-
pciback.passthrough=1

ramdisk=initramfs-6.12.21-gentoo-host.img
===

Please let me know what other information you need and if there is anything I
can try/test to get more information.
Does the mailing list allow gzipped text files as attachment? Or how would you
prefer the kernel-config of the host and guest?

If there are tests to do, please give me several to try as I need to schedule
downtime for reboots.

Many thanks in advance,

Joost





 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.