[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: PCI pass-through problem for SN570 NVME SSD



On Fri, Jul 8, 2022 at 10:28 AM G.R. <firemeteor@xxxxxxxxxxxxxxxxxxxxx> wrote:
>
> On Fri, Jul 8, 2022 at 12:38 AM Jan Beulich <jbeulich@xxxxxxxx> wrote:
> > > But the 'xl pci-assignable-remove' will lead to xl segmentation fault...
> > >> [  655.041442] xl[975]: segfault at 0 ip 00007f2cccdaf71f sp 
> > >> 00007ffd73a3d4d0 error 4 in libxenlight.so.4.16.0[7f2cccd92000+7c000]
> > >> [  655.041460] Code: 61 06 00 eb 13 66 0f 1f 44 00 00 83 c3 01 39 5c 24 
> > >> 2c 0f 86 1b 01 00 00 48 8b 34 24 89 d8 4d 89 f9 4d 89 f0 4c 89 e9 4c 89 
> > >> e2 <48> 8b 3c c6 31 c0 48 89 ee e8 53 44 fe ff 83 f8 04 75 ce 48 8b 44
> >
> > That'll need debugging. Cc-ing Anthony for awareness, but I'm sure
> > he'll need more data to actually stand a chance of doing something
> > about it.
> >
> > Is there any chance you could be doing some debugging work yourself,
> > at the very least to figure out where this (apparent) NULL deref is
> > happening?
> Yep, I can collect the call-stack for sure.

The call-stack of the segfault is like this:
0x00007ffff7f0971f in name2bdf () from /usr/lib/libxenlight.so.4.16
(gdb) bt
#0  0x00007ffff7f0971f in name2bdf () from /usr/lib/libxenlight.so.4.16
#1  0x00007ffff7f0a75e in libxl_device_pci_assignable_remove () from
/usr/lib/libxenlight.so.4.16
#2  0x00005555555725bf in main_pciassignable_remove ()
#3  0x00005555555610ab in main ()
It's with a release version of libxenlight. Once I switch it to a
debug version, the segment fault just goes away...
This allows me to move on and test the behavior on 4.16.1 --
unfortunately no change observed at all.
Once I get the SSD assigned to the FreeeBSD 12 domU, the domU still
sees the device but fails to operate.

This time I also built the debug version of 4.16.1 hypervisor.
But unfortunately it shares the same reboot on the first
pci-assignable-add problem.
I cannot follow the suggestion of attaching a serial console yet.
The motherboard does have a serial port connector, but I do not have a
cable at the moment.
Maybe I can grab one, but it takes some time...

What I was able to do is to dump the 'xl dmesg' output from the dom0
boot with a debug hypervisor (see attached).
It does give a few extra lines and hope they could be helpful.

Thanks,
G.R.

Attachment: xldmesg_4.16.1_dbgbuild.log
Description: Text Data


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.