[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [BUG] xl devd segmentation fault on xl block-detach
On 03.05.2017 13:27, Wei Liu wrote: CC Ian On Wed, May 03, 2017 at 03:04:44AM +0300, Reinis Martinsons wrote:Hi, I would like to report a problem with storage driver domain. When detaching 2 virtual block devices from the same domain provided by the same driver domain, this generates a segmentation fault in the driver domain `xl devd` process. I observed the same problem both when manually detaching block devices from Dom0 and when shutting down guest domains with several block devices. For ease of demonstration I am sharing my test results on a simple scenario where virtual block devices are provided from a storage driver domain (DomD) back to Dom0, but I observed identical results for other DomUs. Both of my Dom0 and DomD are Arch Linux (kernel 4.10.11-1-ARCH). I built xen from Arch Linux User Repository (https://aur.archlinux.org/xen.git) latest commit 16894c15a19bfef23550ba09d58e097fe16c4792, which is using Xen 4.8.0 (commit b03cee73197f4a37bf2941b9367105187355e638). Please see the output of `xl info` attached in "xl info (Dom0).txt". When building xen for DomD, I enabled debugging symbols (`debug ?= y` in /Config.mk). I enabled xendriverdomain.service in DomD. DomD configuration file is attached in "DomD.cfg". After 2 consecutive `xl block-attach` and `xl block-detach` commands in Dom0 I am observing the following output: [root@arch-test-dom0 ~]# xl block-attach 0 'format=raw,backendtype=phy,backend=arch-zfs-test,vdev=xvda,target=/dev/zvol/test_pool/test1' [root@arch-test-dom0 ~]# xl block-attach 0 'format=raw,backendtype=phy,backend=arch-zfs-test,vdev=xvdb,target=/dev/zvol/test_pool/test2' [root@arch-test-dom0 ~]# xl block-detach 0 51712 [root@arch-test-dom0 ~]# xl block-detach 0 51728 libxl: error: libxl_device.c:1264:device_destroy_be_watch_cb: timed out while waiting for /local/domain/1/backend/vbd/0/51728 to be removed libxl: error: libxl.c:2009:device_addrm_aocomplete: unable to remove vbd with id 51728 libxl_device_disk_remove failed. The 2nd `xl block-detach` command is generating segmentation fault in DomD `xl devd` process (search_for_guest (libxenlight.so.4.8)) - please see full DomD log output attached in "journalctl (domD).txt". I am also attaching "xenstored-access.log" and output of `xenstore-ls -fp` in "xenstore-ls.txt". In addition, I am attaching output of gdb `backtrace full` command on the generated coredump in DomD as "coredumpctl gdb (DomD).txt" Please let me know if I should provide any other information for debugging this problem. Kind regards Reinis Martinsons[...]# After the 2nd `xl block-detach` command:[...][20170502T23:30:38.176Z] A37.2 rm /local/domain/0/device/vbd/51728 [20170502T23:30:38.177Z] A37.2 rm /local/domain/0/device/vbd [20170502T23:30:38.177Z] A37.2 rm /local/domain/0/device [20170502T23:30:38.178Z] A37.2 rm /libxl/0/device/vbd/51728 [20170502T23:30:38.178Z] A37.2 rm /libxl/0/device/vbd [20170502T23:30:38.179Z] A37.2 rm /libxl/0/device [20170502T23:30:38.179Z] A37.2 rm /libxl/0 [20170502T23:30:38.180Z] A37.2 commit [20170502T23:30:38.180Z] D0 w event device/vbd/51728 FFFFFFFF81AA8180 [20170502T23:30:38.180Z] D0 w event device/vbd FFFFFFFF81AA8180 [20170502T23:30:38.180Z] D0 w event device FFFFFFFF81AA8180 [20170502T23:30:38.181Z] D0 unwatch /local/domain/1/backend/vbd/0/51728/state FFFF88017F40CC20 [20170502T23:30:38.181Z] A37 endconn [20170502T23:31:17.867Z] A38 newconn [20170502T23:31:17.957Z] A38 endconn[...]Core was generated by `/usr/bin/xl devd'. Program terminated with signal SIGSEGV, Segmentation fault. #0 0x00007f49bf42519d in search_for_guest (ddomain=0x7ffc601e7130, domid=0) at libxl.c:3688 3688 if (dguest->domid == domid) [Current thread is 1 (Thread 0x7f49bfa75fc0 (LWP 1403))] (gdb) backtrace full #0 0x00007f49bf42519d in search_for_guest (ddomain=0x7ffc601e7130, domid=0) at libxl.c:3688 dguest = 0x31352f302f646276This seems to suggest dguest is used after freed. But looking at the code of backend_watch_callback, dguest shouldn't be on the list. 3927 /* If this was the last device in the domain, remove it from the list */ 3928 num_devs = dguest->num_vifs + dguest->num_vbds + dguest->num_qdisks; 3929 if (num_devs == 0) { 3930 LIBXL_SLIST_REMOVE(&ddomain->guests, dguest, libxl__ddomain_guest, 3931 next); 3932 LOG(DEBUG, "removed domain %u from the list of active guests", 3933 dguest->domid); 3934 /* Clear any leftovers in libxl/<domid> */ 3935 libxl__xs_rm_checked(gc, XBT_NULL, 3936 GCSPRINTF("libxl/%u", dguest->domid)); 3937 free(dguest); 3938 } 3939 } There is no logging unfortunately. But the xenstore log suggests this path is taken. Can you do a quick retest? Modify the unit file for xl devd to make it `xl -vvv devd` to grab more output. I modified xendriverdomain.service unit file to execute `xl -vvv devd`. This provided following output from journalctl when the service was started: [root@arch-zfs-test ~]# journalctl -b "_SYSTEMD_UNIT=xendriverdomain.service" -- Logs begin at Sat 2017-04-15 01:20:58 EEST, end at Wed 2017-05-03 15:32:12 EEST. -- May 03 14:53:46 arch-zfs-test xl[1396]: xencall:buffer: debug: total allocations:7 total releases:7 May 03 14:53:46 arch-zfs-test xl[1396]: xencall:buffer: debug: current allocations:0 maximum allocations:1 May 03 14:53:46 arch-zfs-test xl[1396]: xencall:buffer: debug: cache current size:1 May 03 14:53:46 arch-zfs-test xl[1396]: xencall:buffer: debug: cache hits:6 misses:1 toobig:0 In addition, full xldevd log was generated - please see "xldevd.log.1" from the respective session attached. Wei. I also attach the repeated test results similar as before. Reinis Attachment:
xldevd.log.1 Attachment:
journalctl (domD).txt Attachment:
xenstored-access.log Attachment:
xenstore-ls.txt Attachment:
coredumpctl gdb.txt _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |