Xen project Mailing List

Re: [Xen-devel] null domains after xl destroy

To: xen-devel@xxxxxxxxxxxxx, glenn@xxxxxxxxxxxxxxx

From: Dietmar Hahn <dietmar.hahn@xxxxxxxxxxxxxx>

Date: Tue, 11 Apr 2017 11:49:48 +0200

Delivery-date: Tue, 11 Apr 2017 09:50:02 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

Am Dienstag, 11. April 2017, 20:03:14 schrieb Glenn Enright: > On 11/04/17 17:59, Juergen Gross wrote: > > On 11/04/17 07:25, Glenn Enright wrote: > >> Hi all > >> > >> We are seeing an odd issue with domu domains from xl destroy, under > >> recent 4.9 kernels a (null) domain is left behind. > > > > I guess this is the dom0 kernel version? > > > >> This has occurred on a variety of hardware, with no obvious commonality. > >> > >> 4.4.55 does not show this behavior. > >> > >> On my test machine I have the following packages installed under > >> centos6, from https://xen.crc.id.au/ > >> > >> ~]# rpm -qa | grep xen > >> xen47-licenses-4.7.2-4.el6.x86_64 > >> xen47-4.7.2-4.el6.x86_64 > >> kernel-xen-4.9.21-1.el6xen.x86_64 > >> xen47-ocaml-4.7.2-4.el6.x86_64 > >> xen47-libs-4.7.2-4.el6.x86_64 > >> xen47-libcacard-4.7.2-4.el6.x86_64 > >> xen47-hypervisor-4.7.2-4.el6.x86_64 > >> xen47-runtime-4.7.2-4.el6.x86_64 > >> kernel-xen-firmware-4.9.21-1.el6xen.x86_64 > >> > >> I've also replicated the issue with 4.9.17 and 4.9.20 > >> > >> To replicate, on a cleanly booted dom0 with one pv VM, I run the > >> following on the VM > >> > >> { > >> while true; do > >> dd bs=1M count=512 if=/dev/zero of=test conv=fdatasync > >> done > >> } > >> > >> Then on the dom0 I do this sequence to reliably get a null domain. This > >> occurs with oxenstored and xenstored both. > >> > >> { > >> xl sync 1 > >> xl destroy 1 > >> } > >> > >> xl list then renders something like ... > >> > >> (null) 1 4 4 --p--d > >> 9.8 0 > > > > Something is referencing the domain, e.g. some of its memory pages are > > still mapped by dom0. You can try # xl debug-keys q and further # xl dmesg to see the output of the previous command. The 'q' dumps domain (and guest debug) info. # xl debug-keys h prints all possible parameters for more informations. Dietmar. > > > >> From what I can see it appears to be disk related. Affected VMs all use > >> lvm storage for their boot disk. lvdisplay of the affected lv shows that > >> the lv has is being help open by something. > > > > How are the disks configured? Especially the backend type is important. > > > >> > >> ~]# lvdisplay test/test.img | grep open > >> # open 1 > >> > >> I've not been able to determine what that thing is as yet. I tried lsof, > >> dmsetup, various lv tools. Waiting for the disk to be released does not > >> work. > >> > >> ~]# xl list > >> Name ID Mem VCPUs State > >> Time(s) > >> Domain-0 0 1512 2 r----- > >> 29.0 > >> (null) 1 4 4 --p--d > >> 9.8 > >> > >> xenstore-ls reports nothing for the null domain id that I can see. > > > > Any qemu process related to the domain still running? > > > > Any dom0 kernel messages related to Xen? > > > > > > Juergen > > > > Yep, 4.9 dom0 kernel > > Typically we see an xl process running, but that has already gone away > in this case. The domU is a PV guest using phy definition, the basic > startup is like this... > > xl -v create -f paramfile extra="console=hvc0 elevator=noop > xen-blkfront.max=64" > > There are no qemu processes or threads anywhere I can see. > > I dont see any meaningful messages in the linux kernel log, and nothing > at all in the hypervisor log. Here is output from the dom0 starting and > then stopping a domU using the above mechanism > > br0: port 2(vif3.0) entered disabled state > br0: port 2(vif4.0) entered blocking state > br0: port 2(vif4.0) entered disabled state > device vif4.0 entered promiscuous mode > IPv6: ADDRCONF(NETDEV_UP): vif4.0: link is not ready > xen-blkback: backend/vbd/4/51713: using 2 queues, protocol 1 > (x86_64-abi) persistent grants > xen-blkback: backend/vbd/4/51721: using 2 queues, protocol 1 > (x86_64-abi) persistent grants > vif vif-4-0 vif4.0: Guest Rx ready > IPv6: ADDRCONF(NETDEV_CHANGE): vif4.0: link becomes ready > br0: port 2(vif4.0) entered blocking state > br0: port 2(vif4.0) entered forwarding state > br0: port 2(vif4.0) entered disabled state > br0: port 2(vif4.0) entered disabled state > device vif4.0 left promiscuous mode > br0: port 2(vif4.0) entered disabled state > > ... here is xl info ... > > host : xxxxxxxxxxxx > release : 4.9.21-1.el6xen.x86_64 > version : #1 SMP Sat Apr 8 18:03:45 AEST 2017 > machine : x86_64 > nr_cpus : 4 > max_cpu_id : 3 > nr_nodes : 1 > cores_per_socket : 4 > threads_per_core : 1 > cpu_mhz : 2394 > hw_caps : > b7ebfbff:0000e3bd:20100800:00000001:00000000:00000000:00000000:00000000 > virt_caps : > total_memory : 8190 > free_memory : 6577 > sharing_freed_memory : 0 > sharing_used_memory : 0 > outstanding_claims : 0 > free_cpus : 0 > xen_major : 4 > xen_minor : 7 > xen_extra : .2 > xen_version : 4.7.2 > xen_caps : xen-3.0-x86_64 xen-3.0-x86_32p > xen_scheduler : credit > xen_pagesize : 4096 > platform_params : virt_start=0xffff800000000000 > xen_changeset : > xen_commandline : dom0_mem=1512M cpufreq=xen dom0_max_vcpus=2 > dom0_vcpus_pin log_lvl=all guest_loglvl=all vcpu_migration_delay=1000 > cc_compiler : gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-17) > cc_compile_by : mockbuild > cc_compile_domain : (none) > cc_compile_date : Mon Apr 3 12:17:20 AEST 2017 > build_id : 0ec32d14d7c34e5d9deaaf6e3b7ea0c8006d68fa > xend_config_format : 4 > > > # cat /proc/cmdline > ro root=UUID=xxxxxxxxxx rd_MD_UUID=xxxxxxxxxxxx rd_NO_LUKS > KEYBOARDTYPE=pc KEYTABLE=us LANG=en_US.UTF-8 rd_MD_UUID=xxxxxxxxxxxxx > SYSFONT=latarcyrheb-sun16 crashkernel=auto rd_NO_LVM rd_NO_DM rhgb quiet > pcie_aspm=off panic=30 max_loop=64 dm_mod.use_blk_mq=y xen-blkfront.max=64 > > The domu is using an lvm on top of a md raid1 array, on direct connected > HDDs. Nothing special hardware wise. The disk line for that domU looks > functionally like... > > disk = [ 'phy:/dev/testlv/test.img,xvda1,w' ] > > I would appreciate any suggestions on how to increase the debug level in > a relevant way or where to look to get more useful information on what > is happening. > > To clarify the actual shutdown sequence that causes problems... > > # xl sysrq $id s > # xl destroy $id > > > Regards, Glenn > > _______________________________________________ > Xen-devel mailing list > Xen-devel@xxxxxxxxxxxxx > https://lists.xen.org/xen-devel -- Company details: http://ts.fujitsu.com/imprint.html _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.