[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Commit moratorium to staging



On 03/11/17 19:29, Roger Pau Monné wrote:
> On Fri, Nov 03, 2017 at 05:57:52PM +0000, George Dunlap wrote:
>> On 11/03/2017 02:52 PM, George Dunlap wrote:
>>> On 11/03/2017 02:14 PM, Roger Pau Monné wrote:
>>>> On Thu, Nov 02, 2017 at 09:55:11AM +0000, Paul Durrant wrote:
>>>>> Hmm. I wonder whether the guest is actually healthy after the migrate. 
>>>>> One could imagine a situation where the storage device model (IDE in our 
>>>>> case I guess) gets stuck in some way but recovers after a timeout in the 
>>>>> guest storage stack. Thus, if you happen to try shut down while it is 
>>>>> still stuck Windows starts trying to shut down but can't. Try after the 
>>>>> timeout though and it can.
>>>>> In the past we did make attempts to support Windows without PV drivers in 
>>>>> XenServer but xenrt would never reliably pass VM lifecycle tests using 
>>>>> emulated devices. That was with qemu trad, but I wonder whether upstream 
>>>>> qemu is actually any better particularly if using older device models 
>>>>> such as IDE and RTL8139 (which are probably largely unmodified from trad).
>>>>
>>>> Since I've been looking into this for a couple of days, and found no
>>>> solution I'm going to write what I've found so far:
>>>>
>>>>  - The issue only affects Windows guests.
>>>>  - It only manifests itself when doing live migration, non-live
>>>>    migration or save/resume work fine.
>>>>  - It affects all x86 hardware, the amount of migrations in order to
>>>>    trigger it seems to depend on the hardware, but doing 20 migrations
>>>>    reliably triggers it on all the hardware I've tested.
>>>
>>> Not good.
>>>
>>> You said that Windows reported that the login process failed somehow?
>>>
>>> Is it possible something bad is happening, like sending spurious page
>>> faults to the guest in logdirty mode?
>>>
>>> I wonder if we could reproduce something like it on Linux -- set a build
>>> going and start localhost migrating; a spurious page fault is likely to
>>> cause the build to fail.
>>
>> Well, with a looping xen-build going on in the guest, I've done 40 local
>> migrates with no problems yet.
>>
>> But Roger -- is this on emulated devices only, no PV drivers?
>>
>> That might be something worth looking at.
> 
> Yes, windows doesn't have PV devices. But save/restore and non-live
> migration seems fine, so it doesn't look to be related to devices, but
> rather to log-dirty or some other aspect of live-migration.

log-dirty for read-I/Os of emulated devices?


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.