[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Commit moratorium to staging



> -----Original Message-----
> From: Roger Pau Monne
> Sent: 02 November 2017 09:42
> To: Paul Durrant <Paul.Durrant@xxxxxxxxxx>
> Cc: Ian Jackson <Ian.Jackson@xxxxxxxxxx>; Lars Kurth
> <lars.kurth@xxxxxxxxxx>; Wei Liu <wei.liu2@xxxxxxxxxx>; Julien Grall
> <julien.grall@xxxxxxxxxx>; committers@xxxxxxxxxxxxxx; xen-devel <xen-
> devel@xxxxxxxxxxxxxxxxxxxx>
> Subject: Re: [Xen-devel] Commit moratorium to staging
> 
> On Thu, Nov 02, 2017 at 09:20:10AM +0000, Paul Durrant wrote:
> > > -----Original Message-----
> > > From: Roger Pau Monne
> > > Sent: 02 November 2017 09:15
> > > To: Roger Pau Monne <roger.pau@xxxxxxxxxx>
> > > Cc: Ian Jackson <Ian.Jackson@xxxxxxxxxx>; Lars Kurth
> > > <lars.kurth@xxxxxxxxxx>; Wei Liu <wei.liu2@xxxxxxxxxx>; Julien Grall
> > > <julien.grall@xxxxxxxxxx>; Paul Durrant <Paul.Durrant@xxxxxxxxxx>;
> > > committers@xxxxxxxxxxxxxx; xen-devel <xen-
> devel@xxxxxxxxxxxxxxxxxxxx>
> > > Subject: Re: [Xen-devel] Commit moratorium to staging
> > >
> > > On Wed, Nov 01, 2017 at 04:17:10PM +0000, Roger Pau Monné wrote:
> > > > On Wed, Nov 01, 2017 at 02:07:48PM +0000, Ian Jackson wrote:
> > > > > * Affected hosts differ from unaffected hosts according to cpuid.
> > > > >   Roger has repro'd the bug on an unaffected host by masking out
> > > > >   certain cpuid bits.  There are 6 implicated bits and he is working
> > > > >   to narrow that down.
> > > >
> > > > I'm currently trying to narrow this down and make sure the above is
> > > > accurate.
> > >
> > > So I was wrong with this, I guess I've run the tests on the wrong
> > > host. Even when masking the different cpuid bits in the guest the
> > > tests still succeeds.
> > >
> > > AFAICT the test fail or succeed reliably depending on the host
> > > hardware. I don't really have many ideas about what to do next, but I
> > > think it would be useful to create a manual osstest flight that runs
> > > the win16 job in all the different hosts in the colo. I would also
> > > capture the normal information that Xen collects after each test (xl
> > > info, /proc/cpuid, serial logs...).
> > >
> > > Is there anything else not captured by ts-logs-capture that would be
> > > interesting in order to help debug the issue?
> >
> > Does the shutdown reliably complete prior to migrate and then only fail
> intermittently after a localhost migrate?
> 
> AFAICT yes, but it can also be added to the test in order to be sure.
> 
> > It might be useful to know what cpuid info is seen by the guest before and
> after migrate.
> 
> Is there anyway to get that from windows in an automatic way? If not I
> could test that with a Debian guest. In fact it might even be a good
> thing for Linux based guest to be added to the regular migration tests
> in order to make sure cpuid bits don't change across migrations.
> 

I found this for windows:

https://www.cpuid.com/downloads/cpu-z/cpu-z_1.81-en.exe

It can generate a text or html report as well as being run interactively. But 
you may get more mileage from using a debian HVM guest. I guess it may also be 
useful is we can get a scan of available MSRs and content before and after 
migrate too.

> > Another datapoint... does the shutdown fail if you insert a delay of a 
> > couple
> of minutes between the migrate and the shutdown?
> 
> Sometimes, after a variable number of calls to xl shutdown ... the
> guest usually ends up shutting down.
> 

Hmm. I wonder whether the guest is actually healthy after the migrate. One 
could imagine a situation where the storage device model (IDE in our case I 
guess) gets stuck in some way but recovers after a timeout in the guest storage 
stack. Thus, if you happen to try shut down while it is still stuck Windows 
starts trying to shut down but can't. Try after the timeout though and it can.
In the past we did make attempts to support Windows without PV drivers in 
XenServer but xenrt would never reliably pass VM lifecycle tests using emulated 
devices. That was with qemu trad, but I wonder whether upstream qemu is 
actually any better particularly if using older device models such as IDE and 
RTL8139 (which are probably largely unmodified from trad).

  Paul

> Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.