[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [qemu-upstream-unstable test] 21375: regressions - FAIL



On Mon, 2013-11-18 at 17:18 +0000, Anthony PERARD wrote:
> On Wed, Nov 06, 2013 at 05:22:29PM +0000, Anthony PERARD wrote:
> > On Fri, Nov 01, 2013 at 03:46:36PM +0000, Anthony PERARD wrote:
> > > On Fri, Nov 01, 2013 at 12:06:51PM +0000, Ian Campbell wrote:
> > > > On Fri, 2013-11-01 at 11:58 +0000, Anthony PERARD wrote:
> > > > > On Fri, Nov 01, 2013 at 10:43:16AM +0000, Ian Campbell wrote:
> > > > > > On Fri, 2013-11-01 at 10:38 +0000, xen.org wrote:
> > > > > > > flight 21375 qemu-upstream-unstable real [real]
> > > > > > > http://www.chiark.greenend.org.uk/~xensrcts/logs/21375/
> > > > > > > 
> > > > > > > Regressions :-(
> > > > > > > 
> > > > > > > Tests which did not succeed and are blocking,
> > > > > > > including tests which could not be run:
> > > > > > >  test-amd64-i386-qemuu-rhel6hvm-intel  7 redhat-install    fail 
> > > > > > > REGR. vs. 20054
> > > > > > 
> > > > > > Anythony, have you made any progress on this? It's been failing for 
> > > > > > ages
> > > > > > now...
> > > > > 
> > > > > Yes, looks like the bug it trigger during a vesa resolution change. I
> > > > > have try to use the vgabios blob that we use for qemu-traditionnal and
> > > > > it works fine. But with the vgabios blob provided by qemu, it does not
> > > > > work... I'm still not sure of what the bug is, but I'm getting closer 
> > > > > to
> > > > > it.
> > > > 
> > > > Yay!
> > > > 
> > > > > Also, this happen only on an Intel machine, on an AMD machine,
> > > > > everything works like a charm.
> > > > > 
> > > > > More detail, if anyone want to know:
> > > > > It's look like syslinux is doing a int 10h call that never return to 
> > > > > set
> > > > > video mode:
> > > > > Int 0x10, with AX=0x4F02
> > > > 
> > > > This looks like it might be handled by SeaBIOS vgasrc/vbe.c:vbe_104f00 ?
> > > > There seem to be a few changes in upstream seabios since the version
> > > > referenced in xen.git:Config.mk. Many of them are cleanups/code motion
> > > > but a few look worth investigating. 
> > > 
> > > I've been able to get the things working by applying a patch to vgabios
> > > that is in xen tree: a0e7ccf6864c196906d58b54cd0996b4dbc1b022
> > > This patch allow to clear the framebuffer much faster.
> > > 
> > > But it those not really help be to understand why the guest freeze. A
> > > couple more printf might.
> > 
> > I finally managed to have a better understanding of the issue.
> > 
> > So, the vgabios blob provided by QEMU have a routine to clear the video
> > ram that take few seconds to run. That give enough time to QEMU to try
> > to refresh is display, and this mean they will be a call to
> > xc_hvm_track_dirty_vram(). If the function is called while the vgabios
> > routine is running, then the guest is lost.
> > 
> > The issue appear only with an Intel machine on an HVM guest using EPT.
> > Having the guest using shadow works fine. So I'm going to investigate
> > the track_dirty code in Xen.
> > 
> > The vgabios routine is called by syslinux with an Int 0x10, I tryied to
> > get some debug print after the call, either from the guest serial or
> > by using the Xen debug ioport, nothing ever appear, and gdbsx only gave
> > me some weird IP which does not appear to point to any usefull code
> > (it's all zeros).
> 
> An other update,
> 
> we had the idee of trying this on earlier versin of Xen, and it turns
> out that Xen 4.3 works fine. One bisect later, and a commit turns out.
> 
> commit 86781624f8df1d50eb4185cfc2ddce926798f7aa
> x86_emulate: PUSH <mem> must read source operand just once
> ... for the case of accessing MMIO.
> 
> So after this commit, syslinux stop working correctly with the last
> version of QEMU. This happen if QEMU is calling track_dirty_vram.
> 
> I also have use xentrace/xenalyze to try to grab more information about
> the issue, it did not really help, but it's tell me that the guest is
> stock on a specific instruction (it result in vmexit EPT_VIOLATION over
> and over on xentrace). And that were the guest is stock:
> 
>    0xa126:  mov    %eax,%cr0
>    0xa129:  ljmp   $0xf2e,$0xa12e
>    0xa130:  mov    $0x26,%dl
>    0xa132:  or     %bh,(%eax)
>    0xa134:  movzww %sp,%sp
>    0xa138:  mov    %edx,%ds
>    0xa13a:  mov    %edx,%es
>    0xa13c:  mov    %edx,%fs
>    0xa13e:  mov    %edx,%gs
>    0xa140:  jmp    *%ebx
>    0xa142:  pushf  
> => 0xa143:  lcall  *%cs:(%si)
>    0xa147:  mov    $0x0,%ch

OOI what is the encoding of the bad instruction?

> 
> Before trying on earlier version of Xen, I try to understand what when
> wrong on the Xen side, it turn out that, in the track_dirty_vram
> hypercall, a call to hap_enable_log_dirty() is all that needed to break
> the guest.
> 
> Jan, any idee of what the issue is?
> 
> Regards,
> 



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.