[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] null domains after xl destroy



Am Dienstag, 11. April 2017, 20:03:14 schrieb Glenn Enright:
> On 11/04/17 17:59, Juergen Gross wrote:
> > On 11/04/17 07:25, Glenn Enright wrote:
> >> Hi all
> >>
> >> We are seeing an odd issue with domu domains from xl destroy, under
> >> recent 4.9 kernels a (null) domain is left behind.
> >
> > I guess this is the dom0 kernel version?
> >
> >> This has occurred on a variety of hardware, with no obvious commonality.
> >>
> >> 4.4.55 does not show this behavior.
> >>
> >> On my test machine I have the following packages installed under
> >> centos6, from https://xen.crc.id.au/
> >>
> >> ~]# rpm -qa | grep xen
> >> xen47-licenses-4.7.2-4.el6.x86_64
> >> xen47-4.7.2-4.el6.x86_64
> >> kernel-xen-4.9.21-1.el6xen.x86_64
> >> xen47-ocaml-4.7.2-4.el6.x86_64
> >> xen47-libs-4.7.2-4.el6.x86_64
> >> xen47-libcacard-4.7.2-4.el6.x86_64
> >> xen47-hypervisor-4.7.2-4.el6.x86_64
> >> xen47-runtime-4.7.2-4.el6.x86_64
> >> kernel-xen-firmware-4.9.21-1.el6xen.x86_64
> >>
> >> I've also replicated the issue with 4.9.17 and 4.9.20
> >>
> >> To replicate, on a cleanly booted dom0 with one pv VM, I run the
> >> following on the VM
> >>
> >> {
> >> while true; do
> >>  dd bs=1M count=512 if=/dev/zero of=test conv=fdatasync
> >> done
> >> }
> >>
> >> Then on the dom0 I do this sequence to reliably get a null domain. This
> >> occurs with oxenstored and xenstored both.
> >>
> >> {
> >> xl sync 1
> >> xl destroy 1
> >> }
> >>
> >> xl list then renders something like ...
> >>
> >> (null)                                       1     4     4     --p--d
> >> 9.8     0
> >
> > Something is referencing the domain, e.g. some of its memory pages are
> > still mapped by dom0.

You can try
# xl debug-keys q
and further
# xl dmesg
to see the output of the previous command. The 'q' dumps domain
(and guest debug) info.
# xl debug-keys h
prints all possible parameters for more informations.

Dietmar.

> >
> >> From what I can see it appears to be disk related. Affected VMs all use
> >> lvm storage for their boot disk. lvdisplay of the affected lv shows that
> >> the lv has is being help open by something.
> >
> > How are the disks configured? Especially the backend type is important.
> >
> >>
> >> ~]# lvdisplay test/test.img | grep open
> >>   # open                 1
> >>
> >> I've not been able to determine what that thing is as yet. I tried lsof,
> >> dmsetup, various lv tools. Waiting for the disk to be released does not
> >> work.
> >>
> >> ~]# xl list
> >> Name                                        ID   Mem VCPUs      State
> >> Time(s)
> >> Domain-0                                     0  1512     2     r-----
> >> 29.0
> >> (null)                                       1     4     4     --p--d
> >> 9.8
> >>
> >> xenstore-ls reports nothing for the null domain id that I can see.
> >
> > Any qemu process related to the domain still running?
> >
> > Any dom0 kernel messages related to Xen?
> >
> >
> > Juergen
> >
> 
> Yep, 4.9 dom0 kernel
> 
> Typically we see an xl process running, but that has already gone away 
> in this case. The domU is a PV guest using phy definition, the basic 
> startup is like this...
> 
> xl -v create -f paramfile extra="console=hvc0 elevator=noop 
> xen-blkfront.max=64"
> 
> There are no qemu processes or threads anywhere I can see.
> 
> I dont see any meaningful messages in the linux kernel log, and nothing 
> at all in the hypervisor log. Here is output from the dom0 starting and 
> then stopping a domU using the above mechanism
> 
> br0: port 2(vif3.0) entered disabled state
> br0: port 2(vif4.0) entered blocking state
> br0: port 2(vif4.0) entered disabled state
> device vif4.0 entered promiscuous mode
> IPv6: ADDRCONF(NETDEV_UP): vif4.0: link is not ready
> xen-blkback: backend/vbd/4/51713: using 2 queues, protocol 1 
> (x86_64-abi) persistent grants
> xen-blkback: backend/vbd/4/51721: using 2 queues, protocol 1 
> (x86_64-abi) persistent grants
> vif vif-4-0 vif4.0: Guest Rx ready
> IPv6: ADDRCONF(NETDEV_CHANGE): vif4.0: link becomes ready
> br0: port 2(vif4.0) entered blocking state
> br0: port 2(vif4.0) entered forwarding state
> br0: port 2(vif4.0) entered disabled state
> br0: port 2(vif4.0) entered disabled state
> device vif4.0 left promiscuous mode
> br0: port 2(vif4.0) entered disabled state
> 
> ... here is xl info ...
> 
> host                   : xxxxxxxxxxxx
> release                : 4.9.21-1.el6xen.x86_64
> version                : #1 SMP Sat Apr 8 18:03:45 AEST 2017
> machine                : x86_64
> nr_cpus                : 4
> max_cpu_id             : 3
> nr_nodes               : 1
> cores_per_socket       : 4
> threads_per_core       : 1
> cpu_mhz                : 2394
> hw_caps                : 
> b7ebfbff:0000e3bd:20100800:00000001:00000000:00000000:00000000:00000000
> virt_caps              :
> total_memory           : 8190
> free_memory            : 6577
> sharing_freed_memory   : 0
> sharing_used_memory    : 0
> outstanding_claims     : 0
> free_cpus              : 0
> xen_major              : 4
> xen_minor              : 7
> xen_extra              : .2
> xen_version            : 4.7.2
> xen_caps               : xen-3.0-x86_64 xen-3.0-x86_32p
> xen_scheduler          : credit
> xen_pagesize           : 4096
> platform_params        : virt_start=0xffff800000000000
> xen_changeset          :
> xen_commandline        : dom0_mem=1512M cpufreq=xen dom0_max_vcpus=2 
> dom0_vcpus_pin log_lvl=all guest_loglvl=all vcpu_migration_delay=1000
> cc_compiler            : gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-17)
> cc_compile_by          : mockbuild
> cc_compile_domain      : (none)
> cc_compile_date        : Mon Apr  3 12:17:20 AEST 2017
> build_id               : 0ec32d14d7c34e5d9deaaf6e3b7ea0c8006d68fa
> xend_config_format     : 4
> 
> 
> # cat /proc/cmdline
> ro root=UUID=xxxxxxxxxx rd_MD_UUID=xxxxxxxxxxxx rd_NO_LUKS 
> KEYBOARDTYPE=pc KEYTABLE=us LANG=en_US.UTF-8 rd_MD_UUID=xxxxxxxxxxxxx 
> SYSFONT=latarcyrheb-sun16 crashkernel=auto rd_NO_LVM rd_NO_DM rhgb quiet 
> pcie_aspm=off panic=30 max_loop=64 dm_mod.use_blk_mq=y xen-blkfront.max=64
> 
> The domu is using an lvm on top of a md raid1 array, on direct connected 
> HDDs. Nothing special hardware wise. The disk line for that domU looks 
> functionally like...
> 
> disk = [ 'phy:/dev/testlv/test.img,xvda1,w' ]
> 
> I would appreciate any suggestions on how to increase the debug level in 
> a relevant way or where to look to get more useful information on what 
> is happening.
> 
> To clarify the actual shutdown sequence that causes problems...
> 
> # xl sysrq $id s
> # xl destroy $id
> 
> 
> Regards, Glenn
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxx
> https://lists.xen.org/xen-devel

-- 
Company details: http://ts.fujitsu.com/imprint.html

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.