[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [BUG] xl devd segmentation fault on xl block-detach
CC Ian On Wed, May 03, 2017 at 03:04:44AM +0300, Reinis Martinsons wrote: > Hi, > > I would like to report a problem with storage driver domain. When detaching > 2 virtual block devices from the same domain provided by the same driver > domain, this generates a segmentation fault in the driver domain `xl devd` > process. I observed the same problem both when manually detaching block > devices from Dom0 and when shutting down guest domains with several block > devices. > > For ease of demonstration I am sharing my test results on a simple scenario > where virtual block devices are provided from a storage driver domain (DomD) > back to Dom0, but I observed identical results for other DomUs. > > Both of my Dom0 and DomD are Arch Linux (kernel 4.10.11-1-ARCH). I built xen > from Arch Linux User Repository (https://aur.archlinux.org/xen.git) latest > commit 16894c15a19bfef23550ba09d58e097fe16c4792, which is using Xen 4.8.0 > (commit b03cee73197f4a37bf2941b9367105187355e638). Please see the output of > `xl info` attached in "xl info (Dom0).txt". When building xen for DomD, I > enabled debugging symbols (`debug ?= y` in /Config.mk). I enabled > xendriverdomain.service in DomD. DomD configuration file is attached in > "DomD.cfg". > > After 2 consecutive `xl block-attach` and `xl block-detach` commands in Dom0 > I am observing the following output: > > [root@arch-test-dom0 ~]# xl block-attach 0 > 'format=raw,backendtype=phy,backend=arch-zfs-test,vdev=xvda,target=/dev/zvol/test_pool/test1' > [root@arch-test-dom0 ~]# xl block-attach 0 > 'format=raw,backendtype=phy,backend=arch-zfs-test,vdev=xvdb,target=/dev/zvol/test_pool/test2' > [root@arch-test-dom0 ~]# xl block-detach 0 51712 > [root@arch-test-dom0 ~]# xl block-detach 0 51728 > libxl: error: libxl_device.c:1264:device_destroy_be_watch_cb: timed out > while waiting for /local/domain/1/backend/vbd/0/51728 to be removed > libxl: error: libxl.c:2009:device_addrm_aocomplete: unable to remove vbd > with id 51728 > libxl_device_disk_remove failed. > > The 2nd `xl block-detach` command is generating segmentation fault in DomD > `xl devd` process (search_for_guest (libxenlight.so.4.8)) - please see full > DomD log output attached in "journalctl (domD).txt". > > I am also attaching "xenstored-access.log" and output of `xenstore-ls -fp` > in "xenstore-ls.txt". In addition, I am attaching output of gdb `backtrace > full` command on the generated coredump in DomD as "coredumpctl gdb > (DomD).txt" > > Please let me know if I should provide any other information for debugging > this problem. > > Kind regards > > Reinis Martinsons [...] > # After the 2nd `xl block-detach` command: > [...] > [20170502T23:30:38.176Z] A37.2 rm > /local/domain/0/device/vbd/51728 > [20170502T23:30:38.177Z] A37.2 rm /local/domain/0/device/vbd > [20170502T23:30:38.177Z] A37.2 rm /local/domain/0/device > [20170502T23:30:38.178Z] A37.2 rm /libxl/0/device/vbd/51728 > [20170502T23:30:38.178Z] A37.2 rm /libxl/0/device/vbd > [20170502T23:30:38.179Z] A37.2 rm /libxl/0/device > [20170502T23:30:38.179Z] A37.2 rm /libxl/0 > [20170502T23:30:38.180Z] A37.2 commit > [20170502T23:30:38.180Z] D0 w event device/vbd/51728 > FFFFFFFF81AA8180 > [20170502T23:30:38.180Z] D0 w event device/vbd FFFFFFFF81AA8180 > [20170502T23:30:38.180Z] D0 w event device FFFFFFFF81AA8180 > [20170502T23:30:38.181Z] D0 unwatch > /local/domain/1/backend/vbd/0/51728/state FFFF88017F40CC20 > [20170502T23:30:38.181Z] A37 endconn > [20170502T23:31:17.867Z] A38 newconn > [20170502T23:31:17.957Z] A38 endconn [...] > Core was generated by `/usr/bin/xl devd'. > Program terminated with signal SIGSEGV, Segmentation fault. > #0 0x00007f49bf42519d in search_for_guest (ddomain=0x7ffc601e7130, domid=0) > at libxl.c:3688 > 3688 if (dguest->domid == domid) > [Current thread is 1 (Thread 0x7f49bfa75fc0 (LWP 1403))] > (gdb) backtrace full > #0 0x00007f49bf42519d in search_for_guest (ddomain=0x7ffc601e7130, domid=0) > at libxl.c:3688 > dguest = 0x31352f302f646276 This seems to suggest dguest is used after freed. But looking at the code of backend_watch_callback, dguest shouldn't be on the list. 3927 /* If this was the last device in the domain, remove it from the list */ 3928 num_devs = dguest->num_vifs + dguest->num_vbds + dguest->num_qdisks; 3929 if (num_devs == 0) { 3930 LIBXL_SLIST_REMOVE(&ddomain->guests, dguest, libxl__ddomain_guest, 3931 next); 3932 LOG(DEBUG, "removed domain %u from the list of active guests", 3933 dguest->domid); 3934 /* Clear any leftovers in libxl/<domid> */ 3935 libxl__xs_rm_checked(gc, XBT_NULL, 3936 GCSPRINTF("libxl/%u", dguest->domid)); 3937 free(dguest); 3938 } 3939 } There is no logging unfortunately. But the xenstore log suggests this path is taken. Can you do a quick retest? Modify the unit file for xl devd to make it `xl -vvv devd` to grab more output. Wei. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |