[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] null domains after xl destroy
On 11/04/17 17:59, Juergen Gross wrote: On 11/04/17 07:25, Glenn Enright wrote:Hi all We are seeing an odd issue with domu domains from xl destroy, under recent 4.9 kernels a (null) domain is left behind.I guess this is the dom0 kernel version?This has occurred on a variety of hardware, with no obvious commonality. 4.4.55 does not show this behavior. On my test machine I have the following packages installed under centos6, from https://xen.crc.id.au/ ~]# rpm -qa | grep xen xen47-licenses-4.7.2-4.el6.x86_64 xen47-4.7.2-4.el6.x86_64 kernel-xen-4.9.21-1.el6xen.x86_64 xen47-ocaml-4.7.2-4.el6.x86_64 xen47-libs-4.7.2-4.el6.x86_64 xen47-libcacard-4.7.2-4.el6.x86_64 xen47-hypervisor-4.7.2-4.el6.x86_64 xen47-runtime-4.7.2-4.el6.x86_64 kernel-xen-firmware-4.9.21-1.el6xen.x86_64 I've also replicated the issue with 4.9.17 and 4.9.20 To replicate, on a cleanly booted dom0 with one pv VM, I run the following on the VM { while true; do dd bs=1M count=512 if=/dev/zero of=test conv=fdatasync done } Then on the dom0 I do this sequence to reliably get a null domain. This occurs with oxenstored and xenstored both. { xl sync 1 xl destroy 1 } xl list then renders something like ... (null) 1 4 4 --p--d 9.8 0Something is referencing the domain, e.g. some of its memory pages are still mapped by dom0.From what I can see it appears to be disk related. Affected VMs all use lvm storage for their boot disk. lvdisplay of the affected lv shows that the lv has is being help open by something.How are the disks configured? Especially the backend type is important.~]# lvdisplay test/test.img | grep open # open 1 I've not been able to determine what that thing is as yet. I tried lsof, dmsetup, various lv tools. Waiting for the disk to be released does not work. ~]# xl list Name ID Mem VCPUs State Time(s) Domain-0 0 1512 2 r----- 29.0 (null) 1 4 4 --p--d 9.8 xenstore-ls reports nothing for the null domain id that I can see.Any qemu process related to the domain still running? Any dom0 kernel messages related to Xen? Juergen Yep, 4.9 dom0 kernelTypically we see an xl process running, but that has already gone away in this case. The domU is a PV guest using phy definition, the basic startup is like this... xl -v create -f paramfile extra="console=hvc0 elevator=noop xen-blkfront.max=64" There are no qemu processes or threads anywhere I can see.I dont see any meaningful messages in the linux kernel log, and nothing at all in the hypervisor log. Here is output from the dom0 starting and then stopping a domU using the above mechanism br0: port 2(vif3.0) entered disabled state br0: port 2(vif4.0) entered blocking state br0: port 2(vif4.0) entered disabled state device vif4.0 entered promiscuous mode IPv6: ADDRCONF(NETDEV_UP): vif4.0: link is not readyxen-blkback: backend/vbd/4/51713: using 2 queues, protocol 1 (x86_64-abi) persistent grants xen-blkback: backend/vbd/4/51721: using 2 queues, protocol 1 (x86_64-abi) persistent grants vif vif-4-0 vif4.0: Guest Rx ready IPv6: ADDRCONF(NETDEV_CHANGE): vif4.0: link becomes ready br0: port 2(vif4.0) entered blocking state br0: port 2(vif4.0) entered forwarding state br0: port 2(vif4.0) entered disabled state br0: port 2(vif4.0) entered disabled state device vif4.0 left promiscuous mode br0: port 2(vif4.0) entered disabled state ... here is xl info ... host : xxxxxxxxxxxx release : 4.9.21-1.el6xen.x86_64 version : #1 SMP Sat Apr 8 18:03:45 AEST 2017 machine : x86_64 nr_cpus : 4 max_cpu_id : 3 nr_nodes : 1 cores_per_socket : 4 threads_per_core : 1 cpu_mhz : 2394hw_caps : b7ebfbff:0000e3bd:20100800:00000001:00000000:00000000:00000000:00000000 virt_caps : total_memory : 8190 free_memory : 6577 sharing_freed_memory : 0 sharing_used_memory : 0 outstanding_claims : 0 free_cpus : 0 xen_major : 4 xen_minor : 7 xen_extra : .2 xen_version : 4.7.2 xen_caps : xen-3.0-x86_64 xen-3.0-x86_32p xen_scheduler : credit xen_pagesize : 4096 platform_params : virt_start=0xffff800000000000 xen_changeset :xen_commandline : dom0_mem=1512M cpufreq=xen dom0_max_vcpus=2 dom0_vcpus_pin log_lvl=all guest_loglvl=all vcpu_migration_delay=1000 cc_compiler : gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-17) cc_compile_by : mockbuild cc_compile_domain : (none) cc_compile_date : Mon Apr 3 12:17:20 AEST 2017 build_id : 0ec32d14d7c34e5d9deaaf6e3b7ea0c8006d68fa xend_config_format : 4 # cat /proc/cmdlinero root=UUID=xxxxxxxxxx rd_MD_UUID=xxxxxxxxxxxx rd_NO_LUKS KEYBOARDTYPE=pc KEYTABLE=us LANG=en_US.UTF-8 rd_MD_UUID=xxxxxxxxxxxxx SYSFONT=latarcyrheb-sun16 crashkernel=auto rd_NO_LVM rd_NO_DM rhgb quiet pcie_aspm=off panic=30 max_loop=64 dm_mod.use_blk_mq=y xen-blkfront.max=64 The domu is using an lvm on top of a md raid1 array, on direct connected HDDs. Nothing special hardware wise. The disk line for that domU looks functionally like... disk = [ 'phy:/dev/testlv/test.img,xvda1,w' ]I would appreciate any suggestions on how to increase the debug level in a relevant way or where to look to get more useful information on what is happening. To clarify the actual shutdown sequence that causes problems... # xl sysrq $id s # xl destroy $id Regards, Glenn _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |