[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] HVM Migration of domU on Qemu-upstream DM causes stuck system clock with ACPI
On 31 May 2013, at 12:36, Diana Crisan <dcrisan@xxxxxxxxxxxx> wrote: > > > On 31 May 2013, at 11:54, George Dunlap <george.dunlap@xxxxxxxxxxxxx> wrote: > >> On 31/05/13 09:34, Diana Crisan wrote: >>> George, >>> On 30/05/13 17:06, George Dunlap wrote: >>>> On 05/30/2013 04:55 PM, Diana Crisan wrote: >>>>> On 30/05/13 16:26, George Dunlap wrote: >>>>>> On Tue, May 28, 2013 at 4:06 PM, Diana Crisan <dcrisan@xxxxxxxxxxxx> >>>>>> wrote: >>>>>>> Hi, >>>>>>> >>>>>>> >>>>>>> On 26/05/13 09:38, Ian Campbell wrote: >>>>>>>> On Sat, 2013-05-25 at 11:18 +0100, Alex Bligh wrote: >>>>>>>>> George, >>>>>>>>> >>>>>>>>> --On 24 May 2013 17:16:07 +0100 George Dunlap >>>>>>>>> <George.Dunlap@xxxxxxxxxxxxx> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>>> FWIW it's reproducible on every host h/w platform we've tried >>>>>>>>>>> (a total of 2). >>>>>>>>>> Do you see the same effects if you do a local-host migrate? >>>>>>>>> I hadn't even realised that was possible. That would have made testing >>>>>>>>> live >>>>>>>>> migrate easier! >>>>>>>> That's basically the whole reason it is supported ;-) >>>>>>>> >>>>>>>>> How do you avoid the name clash in xen-store? >>>>>>>> Most toolstacks receive the incoming migration into a domain named >>>>>>>> FOO-incoming or some such and then rename to FOO upon completion. Some >>>>>>>> also rename the outgoing domain "FOO-migratedaway" towards the end so >>>>>>>> that the bits of the final teardown which can safely happen after the >>>>>>>> target have start can be done so. >>>>>>>> >>>>>>>> Ian. >>>>>>> I am unsure what I am doing wrong, but I cannot seem to be able to do a >>>>>>> localhost migrate. >>>>>>> >>>>>>> I created a domU using "xl create xl.conf" and once it fully booted I >>>>>>> issued >>>>>>> an "xl migrate 11 localhost". This fails and gives the output below. >>>>>>> >>>>>>> Would you please advise on how to get this working? >>>>>>> >>>>>>> Thanks, >>>>>>> Diana >>>>>>> >>>>>>> >>>>>>> root@ubuntu:~# xl migrate 11 localhost >>>>>>> root@localhost's password: >>>>>>> migration target: Ready to receive domain. >>>>>>> Saving to migration stream new xl format (info 0x0/0x0/2344) >>>>>>> Loading new save file <incoming migration stream> (new xl fmt info >>>>>>> 0x0/0x0/2344) >>>>>>> Savefile contains xl domain config >>>>>>> xc: progress: Reloading memory pages: 53248/1048575 5% >>>>>>> xc: progress: Reloading memory pages: 105472/1048575 10% >>>>>>> libxl: error: libxl_dm.c:1280:device_model_spawn_outcome: domain 12 >>>>>>> device >>>>>>> model: spawn failed (rc=-3) >>>>>>> libxl: error: libxl_create.c:1091:domcreate_devmodel_started: device >>>>>>> model >>>>>>> did not start: -3 >>>>>>> libxl: error: libxl_dm.c:1311:libxl__destroy_device_model: Device Model >>>>>>> already exited >>>>>>> migration target: Domain creation failed (code -3). >>>>>>> libxl: error: libxl_utils.c:393:libxl_read_exactly: file/stream >>>>>>> truncated >>>>>>> reading ready message from migration receiver stream >>>>>>> libxl: info: libxl_exec.c:118:libxl_report_child_exitstatus: migration >>>>>>> target process [10934] exited with error status 3 >>>>>>> Migration failed, resuming at sender. >>>>>>> xc: error: Cannot resume uncooperative HVM guests: Internal error >>>>>>> libxl: error: libxl.c:404:libxl__domain_resume: xc_domain_resume >>>>>>> failed for >>>>>>> domain 11: Success >>>>>> Aha -- I managed to reproduce this one as well. >>>>>> >>>>>> Your problem is the "vncunused=0" -- that's instructing qemu "You must >>>>>> use this exact port for the vnc server". But when you do the migrate, >>>>>> that port is still in use by the "from" domain; so the qemu for the >>>>>> "to" domain can't get it, and fails. >>>>>> >>>>>> Obviously this should fail a lot more gracefully, but that's a bit of >>>>>> a lower-priority bug I think. >>>>>> >>>>>> -George >>>>> Yes, I managed to get to the bottom of it too and got vms migrating on >>>>> localhost on our end. >>>>> >>>>> I can confirm I did get the clock stuck problem while doing a localhost >>>>> migrate. >>>> >>>> Does the script I posted earlier "work" for you (i.e., does it fail after >>>> some number of migrations)? >>> >>> I left your script running throughout the night and it seems that it does >>> not always catch the problem. I see the following: >>> >>> 1. vm has the clock stuck >>> 2. script is still running as it seems the vm is still ping-able. >>> 3. migration fails on the basis that the vm is does not ack the suspend >>> request (see below). >> >> So I wrote a script to run "date", sleep for 2 seconds, and run "date" a >> second time -- and eventually the *sleep* hung. >> >> The VM is still responsive, and I can log in; if I type "date" manually >> successive times then I get an advancing clock, but if I type "sleep 1" it >> just hangs. >> >> If you run "dmesg" in the guest, do you see the following line? >> >> CE: Reprogramming failure. Giving up > > I do. It is preceded by: > CE: xen increased min_delta_ns to 4000000 nsec > It seems that it is always getting stuck when the min_delta_ns is set to 4mil nsec. Could this be it? Overflow perhaps? >> -George > > -- > Diana > _______________________________________________ > Xen-devel mailing list > Xen-devel@xxxxxxxxxxxxx > http://lists.xen.org/xen-devel _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |