[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [BUG] xl devd segmentation fault on xl block-detach

CC Ian

On Wed, May 03, 2017 at 03:04:44AM +0300, Reinis Martinsons wrote:
> Hi,
> I would like to report a problem with storage driver domain. When detaching
> 2 virtual block devices from the same domain provided by the same driver
> domain, this generates a segmentation fault in the driver domain `xl devd`
> process. I observed the same problem both when manually detaching block
> devices from Dom0 and when shutting down guest domains with several block
> devices.
> For ease of demonstration I am sharing my test results on a simple scenario
> where virtual block devices are provided from a storage driver domain (DomD)
> back to Dom0, but I observed identical results for other DomUs.
> Both of my Dom0 and DomD are Arch Linux (kernel 4.10.11-1-ARCH). I built xen
> from Arch Linux User Repository (https://aur.archlinux.org/xen.git) latest
> commit 16894c15a19bfef23550ba09d58e097fe16c4792, which is using Xen 4.8.0
> (commit b03cee73197f4a37bf2941b9367105187355e638). Please see the output of
> `xl info` attached in "xl info (Dom0).txt". When building xen for DomD, I
> enabled debugging symbols (`debug ?= y` in /Config.mk). I enabled
> xendriverdomain.service in DomD. DomD configuration file is attached in
> "DomD.cfg".
> After 2 consecutive `xl block-attach` and `xl block-detach` commands in Dom0
> I am observing the following output:
> [root@arch-test-dom0 ~]# xl block-attach 0 
> 'format=raw,backendtype=phy,backend=arch-zfs-test,vdev=xvda,target=/dev/zvol/test_pool/test1'
> [root@arch-test-dom0 ~]# xl block-attach 0 
> 'format=raw,backendtype=phy,backend=arch-zfs-test,vdev=xvdb,target=/dev/zvol/test_pool/test2'
> [root@arch-test-dom0 ~]# xl block-detach 0 51712
> [root@arch-test-dom0 ~]# xl block-detach 0 51728
> libxl: error: libxl_device.c:1264:device_destroy_be_watch_cb: timed out
> while waiting for /local/domain/1/backend/vbd/0/51728 to be removed
> libxl: error: libxl.c:2009:device_addrm_aocomplete: unable to remove vbd
> with id 51728
> libxl_device_disk_remove failed.
> The 2nd `xl block-detach` command is generating segmentation fault in DomD
> `xl devd` process (search_for_guest (libxenlight.so.4.8)) - please see full
> DomD log output attached in "journalctl (domD).txt".
> I am also attaching "xenstored-access.log" and output of `xenstore-ls -fp`
> in "xenstore-ls.txt". In addition, I am attaching output of gdb `backtrace
> full` command on the generated coredump in DomD as "coredumpctl gdb
> (DomD).txt"
> Please let me know if I should provide any other information for debugging
> this problem.
> Kind regards
> Reinis Martinsons

> # After the 2nd `xl block-detach` command:
> [20170502T23:30:38.176Z]  A37.2        rm        
> /local/domain/0/device/vbd/51728 
> [20170502T23:30:38.177Z]  A37.2        rm        /local/domain/0/device/vbd 
> [20170502T23:30:38.177Z]  A37.2        rm        /local/domain/0/device 
> [20170502T23:30:38.178Z]  A37.2        rm        /libxl/0/device/vbd/51728 
> [20170502T23:30:38.178Z]  A37.2        rm        /libxl/0/device/vbd 
> [20170502T23:30:38.179Z]  A37.2        rm        /libxl/0/device 
> [20170502T23:30:38.179Z]  A37.2        rm        /libxl/0 
> [20170502T23:30:38.180Z]  A37.2        commit    
> [20170502T23:30:38.180Z]  D0           w event   device/vbd/51728 
> [20170502T23:30:38.180Z]  D0           w event   device/vbd FFFFFFFF81AA8180 
> [20170502T23:30:38.180Z]  D0           w event   device FFFFFFFF81AA8180 
> [20170502T23:30:38.181Z]  D0           unwatch   
> /local/domain/1/backend/vbd/0/51728/state FFFF88017F40CC20 
> [20170502T23:30:38.181Z]  A37          endconn   
> [20170502T23:31:17.867Z]  A38          newconn   
> [20170502T23:31:17.957Z]  A38          endconn   
> Core was generated by `/usr/bin/xl devd'.
> Program terminated with signal SIGSEGV, Segmentation fault.
> #0  0x00007f49bf42519d in search_for_guest (ddomain=0x7ffc601e7130, domid=0)
>     at libxl.c:3688
> 3688            if (dguest->domid == domid)
> [Current thread is 1 (Thread 0x7f49bfa75fc0 (LWP 1403))]
> (gdb) backtrace full
> #0  0x00007f49bf42519d in search_for_guest (ddomain=0x7ffc601e7130, domid=0)
>     at libxl.c:3688
>         dguest = 0x31352f302f646276

This seems to suggest dguest is used after freed.

But looking at the code of backend_watch_callback, dguest shouldn't be
on the list.

3927         /* If this was the last device in the domain, remove it from the 
list */
3928         num_devs = dguest->num_vifs + dguest->num_vbds + 
3929         if (num_devs == 0) {
3930             LIBXL_SLIST_REMOVE(&ddomain->guests, dguest, 
3931                                next);
3932             LOG(DEBUG, "removed domain %u from the list of active guests",
3933                        dguest->domid);
3934             /* Clear any leftovers in libxl/<domid> */
3935             libxl__xs_rm_checked(gc, XBT_NULL,
3936                                  GCSPRINTF("libxl/%u", dguest->domid));
3937             free(dguest);
3938         }
3939     }

There is no logging unfortunately. But the xenstore log suggests this
path is taken. Can you do a quick retest? Modify the unit file for xl
devd to make it `xl -vvv devd` to grab more output.


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.