[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-API] xenserver-core tech preview: Hang during pygrub: SOLVED?



The suspicious line in /var/log/debug (see the pastebin below) and that
'blkid' was the culprit keeping the device open looked like juicy clues:

  kernel: vbd vbd-51728: 16 Device in use; refusing to close

Search results:

  https://www.redhat.com/archives/libguestfs/2012-February/msg00023.html

  https://rwmj.wordpress.com/2012/01/19/udev-unexpectedness/#content

blkid is started up by udevd after block device changes.  The theory is
while blkid runs it holds the device open, and after pygrub, xenops
fails to close the device if blkid hasn't finished by then.

The above links suggest to avoid the condition by running 'udevadm
settle' before closing the device.

For a cheap hack, I added 'subprocess.call(("/sbin/udevadm", "settle"))'
at the end of the pygrub script.

Several VMs *in a row* successfully started with no failures, for the
first time since I pulled the host's root filesystem off the OSDs.

In terms of the udev theory, perhaps the extra load on the disk shared
by both OS and OSD was slowing things down enough that the race
condition was rarely triggered.  After putting the OS on a separate
disk, the system was noticeably snappier, triggering this race condition
to where it was difficult to boot two VMs in a row (annoying when a
single failure required the machine to be power-cycled to recover!).

        John




On 10/28/2013 12:23 PM, John Morris wrote:
> On 10/27/2013 11:44 PM, John Morris wrote:
>> This problem crops up occasionally when booting a vm.  I haven't figured
>> out how to repeat it.
>>
>> Host is Scientific Linux 6; technology preview plus snapshot
>> repositories; 64-bit Phenom; ceph 0.67.4 from ceph.com repos.
>>
>> During an 'xe vm-start', pygrub attaches a vm's root vdi in the dom0.
>> The operation never gets as far as detaching them, and the device
>> entries remain in /dev/xvd*.  The 'vm-start' never completes, and other
>> xe commands may or may not work properly afterwards.
>>
>> Attempting to mount a /dev/xvd* entry will hang with a kernel oops, and
>> the mount process cannot be killed.
> 
> Another finding in re. log message:
> 
>   kernel: vbd vbd-51728: 16 Device in use; refusing to close
> 
> The process holding the device open is blkid, started by udevd:
> 
>   blkid -o udev -p /dev/xvdb
> 
>> Pasted below are relevant logs.
>>
>>      John
>>
>> If the below is illegible, see this link:
>>   http://pastebin.ca/2472031
>>
>> /var/log/messages:
>>
>> Oct 27 21:19:58 xen1 kernel: blkfront: xvdb: barrier: enabled
>> Oct 27 21:19:58 xen1 kernel: xvdb: xvdb1 xvdb2
>> Oct 27 21:19:59 xen1 kernel: vbd vbd-51728: 16 Device in use; refusing
>> to close
>> Oct 27 21:19:59 xen1 kernel: qemu-system-i38[2899]: segfault at
>> 7fac042e4000 ip 00007fac0447b129 sp 00007fffe7028630 error 4 in
>> qemu-system-i386[7fac042ed000+309000]
>> Oct 27 21:20:12 xen1 kernel: xapi15: port 1(eth1.81) entered forwarding
>> state
>> Oct 27 21:20:52 xen1 xapi: [ info|xen1.zultron.com|363 UNIX
>> /var/lib/xcp/xapi||cli] xe osd password=null lspools= username=root
>> Oct 27 21:21:42 xen1 xapi: [ info|xen1.zultron.com|365 UNIX
>> /var/lib/xcp/xapi|session.login_with_password D:fd591f08855d|xapi]
>> Session.create trackid=93881075440eba708422d0109c7e4aa7 pool=false
>> uname=root originator=cli is_local_superuser=true auth_user_sid=
>> parent=trackid=9834f5af41c964e225f24279aefe4e49
>> Oct 27 21:21:42 xen1 xapi: [ info|xen1.zultron.com|365 UNIX
>> /var/lib/xcp/xapi|session.logout D:a573674c2654|xapi] Session.destroy
>> trackid=93881075440eba708422d0109c7e4aa7
>> Oct 27 21:22:58 xen1 udevd[585]: worker [5514] unexpectedly returned
>> with status 0x0100
>> Oct 27 21:22:58 xen1 udevd[585]: worker [5514] failed while handling
>> '/devices/vbd-51728/block/xvdb'
>>
>>
>> /var/log/debug:
>>
>> Oct 27 21:19:57 xen27 xenopsd-xenlight: [xenops] Processing disk
>> SR=6afd1b62-1c4f-7898-5511-5cc0969a0da9 VDI=cf5/prov0.root.img
>> Oct 27 21:19:57 xen27 xenopsd-xenlight: [scheduler] Scheduler sleep
>> until 1382926825 (another 27 seconds)
>> Oct 27 21:19:57 xen27 xenopsd-xenlight: [scheduler] Scheduler sleep
>> until 1382926825 (another 27 seconds)
>> Oct 27 21:19:57 xen27 xapi: [ info|xen27.zultron.com|359||storage_impl]
>> VDI.attach dbg:attach_and_activate dp:xenopsd/task/2
>> sr:6afd1b62-1c4f-7898-5511-5cc0969a0da9 vdi:cf5/prov0.root.img
>> read_write:false
>> Oct 27 21:19:57 xen27 xapi: [debug|xen27.zultron.com|359||storage_impl]
>> dbg:attach_and_activate dp:xenopsd/task/2
>> sr:6afd1b62-1c4f-7898-5511-5cc0969a0da9 vdi:cf5/prov0.root.img
>> superstate:attached  RO
>> Oct 27 21:19:57 xen27 xapi: [ info|xen27.zultron.com|360||storage_impl]
>> VDI.activate dbg:attach_and_activate dp:xenopsd/task/2
>> sr:6afd1b62-1c4f-7898-5511-5cc0969a0da9 vdi:cf5/prov0.root.img
>> Oct 27 21:19:57 xen27 xapi: [debug|xen27.zultron.com|360||storage_impl]
>> dbg:attach_and_activate dp:xenopsd/task/2
>> sr:6afd1b62-1c4f-7898-5511-5cc0969a0da9 vdi:cf5/prov0.root.img
>> superstate:activated RO
>> Oct 27 21:19:57 xen27 xenopsd-xenlight: [xenops] Device.Vbd.add
>> (device_number=Xen(1, 0) | params=rbd:cf5/prov0.root.img | phystype=phys)
>> Oct 27 21:19:57 xen27 xenopsd-xenlight: [xenops] adding device
>> B0[/local/domain/0/backend/qdisk/0/51728]
>> F0[/local/domain/0/device/vbd/51728]  H[/xapi/0/hotplug/qdisk/51728]
>> Oct 27 21:19:57 xen27 xenopsd-xenlight: [hotplug] Hotplug.wait_for_plug:
>> frontend (domid=0 | kind=vbd | devid=51728); backend (domid=0 |
>> kind=qdisk | devid=51728)
>> Oct 27 21:19:57 xen27 xenopsd-xenlight: [xenops] Adding watches for:
>> frontend (domid=0 | kind=vbd | devid=51728); backend (domid=0 |
>> kind=qdisk | devid=51728)
>> Oct 27 21:19:57 xen27 xenopsd-xenlight: [xenops] xenstore watch
>> /local/domain/0/backend/qdisk/0/51728/kthread-pid
>> Oct 27 21:19:57 xen27 xenopsd-xenlight: [xenops] xenstore watch
>> /local/domain/0/backend/qdisk/0/51728/tapdisk-pid
>> Oct 27 21:19:57 xen27 xenopsd-xenlight: [xenops] xenstore watch
>> /local/domain/0/backend/qdisk/0/51728/shutdown-done
>> Oct 27 21:19:57 xen27 xenopsd-xenlight: [xenops] xenstore watch
>> /local/domain/0/backend/qdisk/0/51728/params
>> Oct 27 21:19:57 xen27 xenopsd-xenlight: [xenops] Watch on backend domid:
>> 0 kind: qdisk -> frontend domid: 0 devid: 51728
>> Oct 27 21:19:57 xen27 xenopsd-xenlight: [xenops] Unknown device kind:
>> 'qdisk'
>> Oct 27 21:19:57 xen27 xenopsd-xenlight: [xenops] Watch on backend domid:
>> 0 kind: qdisk -> frontend domid: 0 devid: 51728
>> Oct 27 21:19:57 xen27 xenopsd-xenlight: [xenops] Unknown device kind:
>> 'qdisk'
>> Oct 27 21:19:57 xen27 xenopsd-xenlight: [xenops] Watch on backend domid:
>> 0 kind: qdisk -> frontend domid: 0 devid: 51728
>> Oct 27 21:19:57 xen27 xenopsd-xenlight: [xenops] Unknown device kind:
>> 'qdisk'
>> Oct 27 21:19:57 xen27 xenopsd-xenlight: [xenops] Watch on backend domid:
>> 0 kind: qdisk -> frontend domid: 0 devid: 51728
>> Oct 27 21:19:57 xen27 xenopsd-xenlight: [xenops] Unknown device kind:
>> 'qdisk'
>> Oct 27 21:19:57 xen27 xenopsd-xenlight: [scheduler] Scheduler sleep
>> until 1382926825 (another 27 seconds)
>> Oct 27 21:19:57 xen27 xenopsd-xenlight: [hotplug] Synchronised ok with
>> hotplug script: frontend (domid=0 | kind=vbd | devid=51728); backend
>> (domid=0 | kind=qdisk | devid=51728)
>> Oct 27 21:19:57 xen27 xenopsd-xenlight: [hotplug] Checking to see
>> whether /local/domain/0/backend/qdisk/0/51728/hotplug-status
>> Oct 27 21:19:57 xen27 xenopsd-xenlight: [xenops] Device.Vbd successfully
>> added; device_is_online = true
>> Oct 27 21:19:57 xen27 xenopsd-xenlight: [xenops] Waiting for /dev/xvdb
>> to appear
>> Oct 27 21:19:58 xen27 kernel: blkfront: xvdb: barrier: enabled
>> Oct 27 21:19:58 xen27 xenopsd-xenlight: [bootloader] Bootloader
>> commandline: /usr/bin/pygrub -q /dev/xvdb
>> Oct 27 21:19:58 xen27 kernel: xvdb: xvdb1 xvdb2
>> Oct 27 21:19:59 xen27 xenopsd-xenlight: [xenops] xenstore-write
>> /local/domain/0/backend/qdisk/0/51728/online = 0
>> Oct 27 21:19:59 xen27 xenopsd-xenlight: [xenops] Device.del_device
>> setting backend to Closing
>> Oct 27 21:19:59 xen27 xenopsd-xenlight: [xenops]
>> Device.Generic.clean_shutdown_wait frontend (domid=0 | kind=vbd |
>> devid=51728); backend (domid=0 | kind=qdisk | devid=51728)
>> Oct 27 21:19:59 xen27 kernel: vbd vbd-51728: 16 Device in use; refusing
>> to close
>> Oct 27 21:19:59 xen27 xenopsd-xenlight: [xenops] waiting for backend to
>> close
>> Oct 27 21:19:59 xen27 kernel: qemu-system-i38[2899]: segfault at
>> 7fac042e4000 ip 00007fac0447b129 sp 00007fffe7028630 error 4 in
>> qemu-system-i386[7fac042ed000+309000]
>> Oct 27 21:20:01 xen27 CROND[10982]: (root) CMD (/usr/lib64/sa/sa1 1 1)
>>
> 
> _______________________________________________
> Xen-api mailing list
> Xen-api@xxxxxxxxxxxxx
> http://lists.xen.org/cgi-bin/mailman/listinfo/xen-api
> 

_______________________________________________
Xen-api mailing list
Xen-api@xxxxxxxxxxxxx
http://lists.xen.org/cgi-bin/mailman/listinfo/xen-api


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.