[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Commit moratorium to staging
On 11/03/2017 06:35 PM, Juergen Gross wrote: > On 03/11/17 19:29, Roger Pau Monné wrote: >> On Fri, Nov 03, 2017 at 05:57:52PM +0000, George Dunlap wrote: >>> On 11/03/2017 02:52 PM, George Dunlap wrote: >>>> On 11/03/2017 02:14 PM, Roger Pau Monné wrote: >>>>> On Thu, Nov 02, 2017 at 09:55:11AM +0000, Paul Durrant wrote: >>>>>> Hmm. I wonder whether the guest is actually healthy after the migrate. >>>>>> One could imagine a situation where the storage device model (IDE in our >>>>>> case I guess) gets stuck in some way but recovers after a timeout in the >>>>>> guest storage stack. Thus, if you happen to try shut down while it is >>>>>> still stuck Windows starts trying to shut down but can't. Try after the >>>>>> timeout though and it can. >>>>>> In the past we did make attempts to support Windows without PV drivers >>>>>> in XenServer but xenrt would never reliably pass VM lifecycle tests >>>>>> using emulated devices. That was with qemu trad, but I wonder whether >>>>>> upstream qemu is actually any better particularly if using older device >>>>>> models such as IDE and RTL8139 (which are probably largely unmodified >>>>>> from trad). >>>>> >>>>> Since I've been looking into this for a couple of days, and found no >>>>> solution I'm going to write what I've found so far: >>>>> >>>>> - The issue only affects Windows guests. >>>>> - It only manifests itself when doing live migration, non-live >>>>> migration or save/resume work fine. >>>>> - It affects all x86 hardware, the amount of migrations in order to >>>>> trigger it seems to depend on the hardware, but doing 20 migrations >>>>> reliably triggers it on all the hardware I've tested. >>>> >>>> Not good. >>>> >>>> You said that Windows reported that the login process failed somehow? >>>> >>>> Is it possible something bad is happening, like sending spurious page >>>> faults to the guest in logdirty mode? >>>> >>>> I wonder if we could reproduce something like it on Linux -- set a build >>>> going and start localhost migrating; a spurious page fault is likely to >>>> cause the build to fail. >>> >>> Well, with a looping xen-build going on in the guest, I've done 40 local >>> migrates with no problems yet. >>> >>> But Roger -- is this on emulated devices only, no PV drivers? >>> >>> That might be something worth looking at. >> >> Yes, windows doesn't have PV devices. But save/restore and non-live >> migration seems fine, so it doesn't look to be related to devices, but >> rather to log-dirty or some other aspect of live-migration. > > log-dirty for read-I/Os of emulated devices? FWIW I booted a Linux guest with "xen_nopv" on the command-line, gave it 256 MiB of RAM, and then ran a Xen build on it in a loop (see command below). Then I started migrating it in a loop. After an hour or two it had done 146 local migrations, and 46 builds of Xen (swapping onto emulated disk is pretty slow), without any issues. Build command: # while make -j 3 xen ; do git clean -ffdx ; done I'm shutting down the VM and I'll leave it running overnight. -George _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |