[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] segfault in xl create for HVM with PCI passthrough



Hi Ian,
thanks for your quick reply - please see below.
Am 28.10.14 um 11:59 schrieb Ian Campbell:
On Mon, 2014-10-27 at 22:25 +0100, Atom2 wrote:
Hi guys,
I have used XEN for quiet some time and after a steep learning curve I
have always been a very happy user! XEN is really a great product.
Unfortunately I am now facing a problem that leaves me at loss:

Using gentoo as a rolling distribution I recently I upgraded to XEN
4.3.3 (from 4.3.1) and also upgraded the gcc compiler to 4.8.3 (from
4.7.3). Both packages are the latest stable versions available under gentoo.

After emerging (that is the re-compilation and installation of XEN 4.3.3
on my machine) following a toolchain upgrade to the new gcc I can't
start my two HVM FreeBSD virtual machines anymore. Both use PCI
passthrough devices and both the motherboard and the processor support
VT-d. XEN PV gentoo domUs (without passed through PCI devices) still
start up (but are useless for me at the moment as they depend on the
services provided by the tow HVM domus).

The error when starting manifests itself as follows:
# xl create -c pfsense
Parsing config from 01:pfsense.1
xc: info: VIRTUAL MEMORY ARRANGEMENT:
    Loader:        0000000000100000->00000000001c12a4
    Modules:       0000000000000000->0000000000000000
    TOTAL:         0000000000000000->000000001f800000
    ENTRY ADDRESS: 0000000000100000
xc: info: PHYSICAL MEMORY ALLOCATION:
    4KB PAGES: 0x0000000000000200
    2MB PAGES: 0x00000000000000fb
    1GB PAGES: 0x0000000000000000
Segmentation fault
#

The domU is in a state of paused for reasons unknown to me and does not
use any CPU cycles:

Domains are created paused and then unpaused at the end of the creation
process, presumably this didn't happen because xl segfaulted first.
I was not aware of that as this pausing/unpausing happens within a very short period of time and was never visible to me. But that at least explains why the domain is paused ... I again learned something new.

Please can you run the command under gdb and grab a back trace. It would
also be useful to "xl -vvv create pfsense".

First of all attached please find the output of xl -vvv create pfsense. I decided to attach a file as most of the output lines are longer than 80 chars and therefore would most likely be folded by eMail clients. In terms of the last message before the segfault in my attached file it seems to me that the bridge stuff was setup correctly as per the following commands:

# brctl show xenbr0
bridge name     bridge id               STP enabled     interfaces
xenbr0          8000.00187d1d7274       no              bond0
                                                        vif2.0
                                                        vif2.0-emu
# ifconfig
<snip>
lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 0  (Local Loopback)
        RX packets 118  bytes 11408 (11.1 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 118  bytes 11408 (11.1 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

vif2.0: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
        ether fe:ff:ff:ff:ff:ff  txqueuelen 32  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

vif2.0-emu: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet6 fe80::fcff:ffff:feff:ffff  prefixlen 64  scopeid 0x20<link>
        ether fe:ff:ff:ff:ff:ff  txqueuelen 500  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 598 overruns 0  carrier 0  collisions 0

xenbr0: flags=4419<UP,BROADCAST,RUNNING,PROMISC,MULTICAST>  mtu 1500
        inet 192.168.19.2  netmask 255.255.255.0  broadcast 192.168.19.255
        inet6 fe80::218:7dff:fe1d:7274  prefixlen 64  scopeid 0x20<link>
        ether 00:18:7d:1d:72:74  txqueuelen 0  (Ethernet)
        RX packets 58364  bytes 16721913 (15.9 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 13224  bytes 3090681 (2.9 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0


With regards to gdb: I can certainly run the command under gdb after including debug support to the executables - that's no big deal. I would, however, ask for your advice as to what I need to recompile with debugger support? Is xen-tools (which includes xl) sufficient or would you think that I also need to include debug support for gcc as the library that is mentioned in /var/log/messages (libgcc_s.so.1) seems to belong to the gcc package? Or is this library a red herring that just works as the catch-all code getting and finally handling the segfault? Please advise. Tx.
[...]
pci          = [ '04:00.0', '0a:08.0', '0a:0b.0' ]

You say in $subject that the failure is with PCI, is that because you've
tried an HVM domain without and it is ok, or is it just that all your
HVM domains happen to have passthrough enabled?
I haven't tried HVM domains without PCI passthrough (but PV domains w/o PCI passthrough and they did not segfault) so far as all my HVM domains require PCI devices (either at least a network card for pfsense - in actual facts it's more than one that's being passed through - or a SATA controller for my second HVM which is used as a storage VM).

If you think that after the gdb stuff it would still be beneficial to go down that route, I am sure I can come up with something.

Ian.

Again many thanks Atom2

Attachment: output
Description: Text document

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.