[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Kernel panic when passing through 2 identical PCI devices


  • To: "J. Roeleveld" <joost@xxxxxxxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Mon, 2 Jun 2025 15:43:37 +0200
  • Autocrypt: addr=jbeulich@xxxxxxxx; keydata= xsDiBFk3nEQRBADAEaSw6zC/EJkiwGPXbWtPxl2xCdSoeepS07jW8UgcHNurfHvUzogEq5xk hu507c3BarVjyWCJOylMNR98Yd8VqD9UfmX0Hb8/BrA+Hl6/DB/eqGptrf4BSRwcZQM32aZK 7Pj2XbGWIUrZrd70x1eAP9QE3P79Y2oLrsCgbZJfEwCgvz9JjGmQqQkRiTVzlZVCJYcyGGsD /0tbFCzD2h20ahe8rC1gbb3K3qk+LpBtvjBu1RY9drYk0NymiGbJWZgab6t1jM7sk2vuf0Py O9Hf9XBmK0uE9IgMaiCpc32XV9oASz6UJebwkX+zF2jG5I1BfnO9g7KlotcA/v5ClMjgo6Gl MDY4HxoSRu3i1cqqSDtVlt+AOVBJBACrZcnHAUSuCXBPy0jOlBhxPqRWv6ND4c9PH1xjQ3NP nxJuMBS8rnNg22uyfAgmBKNLpLgAGVRMZGaGoJObGf72s6TeIqKJo/LtggAS9qAUiuKVnygo 3wjfkS9A3DRO+SpU7JqWdsveeIQyeyEJ/8PTowmSQLakF+3fote9ybzd880fSmFuIEJldWxp Y2ggPGpiZXVsaWNoQHN1c2UuY29tPsJgBBMRAgAgBQJZN5xEAhsDBgsJCAcDAgQVAggDBBYC AwECHgECF4AACgkQoDSui/t3IH4J+wCfQ5jHdEjCRHj23O/5ttg9r9OIruwAn3103WUITZee e7Sbg12UgcQ5lv7SzsFNBFk3nEQQCACCuTjCjFOUdi5Nm244F+78kLghRcin/awv+IrTcIWF hUpSs1Y91iQQ7KItirz5uwCPlwejSJDQJLIS+QtJHaXDXeV6NI0Uef1hP20+y8qydDiVkv6l IreXjTb7DvksRgJNvCkWtYnlS3mYvQ9NzS9PhyALWbXnH6sIJd2O9lKS1Mrfq+y0IXCP10eS FFGg+Av3IQeFatkJAyju0PPthyTqxSI4lZYuJVPknzgaeuJv/2NccrPvmeDg6Coe7ZIeQ8Yj t0ARxu2xytAkkLCel1Lz1WLmwLstV30g80nkgZf/wr+/BXJW/oIvRlonUkxv+IbBM3dX2OV8 AmRv1ySWPTP7AAMFB/9PQK/VtlNUJvg8GXj9ootzrteGfVZVVT4XBJkfwBcpC/XcPzldjv+3 HYudvpdNK3lLujXeA5fLOH+Z/G9WBc5pFVSMocI71I8bT8lIAzreg0WvkWg5V2WZsUMlnDL9 mpwIGFhlbM3gfDMs7MPMu8YQRFVdUvtSpaAs8OFfGQ0ia3LGZcjA6Ik2+xcqscEJzNH+qh8V m5jjp28yZgaqTaRbg3M/+MTbMpicpZuqF4rnB0AQD12/3BNWDR6bmh+EkYSMcEIpQmBM51qM EKYTQGybRCjpnKHGOxG0rfFY1085mBDZCH5Kx0cl0HVJuQKC+dV2ZY5AqjcKwAxpE75MLFkr wkkEGBECAAkFAlk3nEQCGwwACgkQoDSui/t3IH7nnwCfcJWUDUFKdCsBH/E5d+0ZnMQi+G0A nAuWpQkjM1ASeQwSHEeAWPgskBQL
  • Cc: xen-devel@xxxxxxxxxxxxxxxxxxxx
  • Delivery-date: Mon, 02 Jun 2025 13:43:48 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 02.06.2025 14:28, J. Roeleveld wrote:
> I have a domain to which I pass through 4 PCI devices:
> 2 NVMe drives
> 83:00.0   Samsung 980 NVMe
> 84:00.0   Samsung 980 NVMe
> 
> 2 HBA Controllers
> 86:00.0   LSI SAS3008
> 87:00.0   LSI SAS3008
> 
> This works fine with Xen version 4.18.4_pre1.
> However, when trying to update to 4.19, this fails.

To make it explicit: The domain in question is a PV one.

> Checking the output during boot, I think I found something. But my knowledge 
> is insufficient to figure out what is causing what I am seeing and how to fix 
> this.
>  
> From the below (where I only focus on the 2 NVMe drives), it is similar to 
> the 
> succesfull boot up until it tries to "claiming resource 0000:84:00.0/0".
> At which point sysfs fails because the entry for "84" is already present.

What would be interesting is to know why / how this 2nd registration happens.
It's the same (guest) kernel version afaics, so something must behave
differently on the host. Are you sure the sole (host side) difference is the
hypervisor version? I.e. the Dom0 kernel version is the same in the failing
and successful cases? I ask because there's very little Xen itself does
that would play into pass-through device discovery / resource setup by a
(PV) guest (which doesn't mean Xen can't screw things up). The more relevant
component is the xen-pciback driver in Dom0.

Sadly the log provided does, to me at least, not have enough data to draw
conclusions. Some instrumenting of the guest kernel may be necessary ...

Jan

> The SAS drives appear be dones correctly, but am unable to confirm this as 
> the 
> NVMEs are required for a succesful boot.
> 
> For completeness, I have attached the output for the failed boot, a normal 
> succesfull boot (using 4.18.4_pre1) and my xl.conf (which might need 
> adjusting)
> 
> === (output for just the NVME devices) ===
> pci_bus 0000:83: root bus resource [io  0x0000-0xffff]
> pci_bus 0000:83: root bus resource [mem 0x00000000-0x3fffffffffff]
> pci_bus 0000:83: root bus resource [bus 00-ff]
> pci 0000:83:00.0: [144d:a809] type 00 class 0x010802 PCIe Endpoint
> pci 0000:83:00.0: BAR 0 [mem 0xfbc00000-0xfbc03fff 64bit]
> pcifront pci-0: claiming resource 0000:83:00.0/0
> pcifront pci-0: Creating PCI Frontend Bus 0000:84
> pcifront pci-0: PCI host bridge to bus 0000:84
> pci_bus 0000:84: root bus resource [io  0x0000-0xffff]
> pci_bus 0000:84: root bus resource [mem 0x00000000-0x3fffffffffff]
> pci_bus 0000:84: busn_res: can not insert [bus 84-ff] under domain [bus 
> 00-ff] 
> (conflicts with (null) [bus 83-ff])
> pci_bus 0000:84: root bus resource [bus 00-ff]
> pci 0000:84:00.0: [144d:a809] type 00 class 0x010802 PCIe Endpoint
> pci 0000:84:00.0: BAR 0 [mem 0xfbb00000-0xfbb03fff 64bit]
> pcifront pci-0: claiming resource 0000:84:00.0/0
> sysfs: cannot create duplicate filename '/devices/pci-0/
> pci0000:84/0000:84:00.0/resource0'
> CPU: 2 UID: 0 PID: 39 Comm: xenwatch Not tainted 6.12.21-gentoo-generic #1
> Call Trace:
>  <TASK>
>  dump_stack_lvl+0x56/0x80
>  sysfs_warn_dup+0x51/0x60
>  sysfs_add_bin_file_mode_ns+0x8a/0xa0
>  sysfs_create_bin_file+0x5e/0x80
>  pci_create_attr+0xfc/0x140
>  pci_create_resource_files+0x30/0x90
>  pci_bus_add_device+0x26/0x80
>  pci_bus_add_devices+0x27/0x60
>  pcifront_rescan_root+0x18a/0x220
>  pcifront_connect+0x117/0x170
>  ? xenbus_read_driver_state+0x32/0x60
>  ? xenbus_otherend_changed+0x49/0xa0
>  ? __pfx_xenwatch_thread+0x10/0x10
>  xenwatch_thread+0xf6/0x130
>  ? __pfx_autoremove_wake_function+0x10/0x10
>  kthread+0xea/0x100
>  ? __pfx_kthread+0x10/0x10
>  ret_from_fork+0x1f/0x40
>  ? __pfx_kthread+0x10/0x10
>  ret_from_fork_asm+0x1a/0x30
>  </TASK>
> ===
> 




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.