Xen project Mailing List

Re: [Xen-devel] [qemu-upstream-unstable test] 21375: regressions - FAIL

To: Anthony PERARD <anthony.perard@xxxxxxxxxx>

From: Ian Campbell <Ian.Campbell@xxxxxxxxxx>

Date: Tue, 19 Nov 2013 11:07:22 +0000

Cc: Stefano Stabellini <stefano.stabellini@xxxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxx, "xen.org" <ian.jackson@xxxxxxxxxxxxx>, Jan Beulich <jbeulich@xxxxxxxx>

Delivery-date: Tue, 19 Nov 2013 11:07:32 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

On Mon, 2013-11-18 at 17:18 +0000, Anthony PERARD wrote: > On Wed, Nov 06, 2013 at 05:22:29PM +0000, Anthony PERARD wrote: > > On Fri, Nov 01, 2013 at 03:46:36PM +0000, Anthony PERARD wrote: > > > On Fri, Nov 01, 2013 at 12:06:51PM +0000, Ian Campbell wrote: > > > > On Fri, 2013-11-01 at 11:58 +0000, Anthony PERARD wrote: > > > > > On Fri, Nov 01, 2013 at 10:43:16AM +0000, Ian Campbell wrote: > > > > > > On Fri, 2013-11-01 at 10:38 +0000, xen.org wrote: > > > > > > > flight 21375 qemu-upstream-unstable real [real] > > > > > > > http://www.chiark.greenend.org.uk/~xensrcts/logs/21375/ > > > > > > > > > > > > > > Regressions :-( > > > > > > > > > > > > > > Tests which did not succeed and are blocking, > > > > > > > including tests which could not be run: > > > > > > > test-amd64-i386-qemuu-rhel6hvm-intel 7 redhat-install fail > > > > > > > REGR. vs. 20054 > > > > > > > > > > > > Anythony, have you made any progress on this? It's been failing for > > > > > > ages > > > > > > now... > > > > > > > > > > Yes, looks like the bug it trigger during a vesa resolution change. I > > > > > have try to use the vgabios blob that we use for qemu-traditionnal and > > > > > it works fine. But with the vgabios blob provided by qemu, it does not > > > > > work... I'm still not sure of what the bug is, but I'm getting closer > > > > > to > > > > > it. > > > > > > > > Yay! > > > > > > > > > Also, this happen only on an Intel machine, on an AMD machine, > > > > > everything works like a charm. > > > > > > > > > > More detail, if anyone want to know: > > > > > It's look like syslinux is doing a int 10h call that never return to > > > > > set > > > > > video mode: > > > > > Int 0x10, with AX=0x4F02 > > > > > > > > This looks like it might be handled by SeaBIOS vgasrc/vbe.c:vbe_104f00 ? > > > > There seem to be a few changes in upstream seabios since the version > > > > referenced in xen.git:Config.mk. Many of them are cleanups/code motion > > > > but a few look worth investigating. > > > > > > I've been able to get the things working by applying a patch to vgabios > > > that is in xen tree: a0e7ccf6864c196906d58b54cd0996b4dbc1b022 > > > This patch allow to clear the framebuffer much faster. > > > > > > But it those not really help be to understand why the guest freeze. A > > > couple more printf might. > > > > I finally managed to have a better understanding of the issue. > > > > So, the vgabios blob provided by QEMU have a routine to clear the video > > ram that take few seconds to run. That give enough time to QEMU to try > > to refresh is display, and this mean they will be a call to > > xc_hvm_track_dirty_vram(). If the function is called while the vgabios > > routine is running, then the guest is lost. > > > > The issue appear only with an Intel machine on an HVM guest using EPT. > > Having the guest using shadow works fine. So I'm going to investigate > > the track_dirty code in Xen. > > > > The vgabios routine is called by syslinux with an Int 0x10, I tryied to > > get some debug print after the call, either from the guest serial or > > by using the Xen debug ioport, nothing ever appear, and gdbsx only gave > > me some weird IP which does not appear to point to any usefull code > > (it's all zeros). > > An other update, > > we had the idee of trying this on earlier versin of Xen, and it turns > out that Xen 4.3 works fine. One bisect later, and a commit turns out. > > commit 86781624f8df1d50eb4185cfc2ddce926798f7aa > x86_emulate: PUSH <mem> must read source operand just once > ... for the case of accessing MMIO. > > So after this commit, syslinux stop working correctly with the last > version of QEMU. This happen if QEMU is calling track_dirty_vram. > > I also have use xentrace/xenalyze to try to grab more information about > the issue, it did not really help, but it's tell me that the guest is > stock on a specific instruction (it result in vmexit EPT_VIOLATION over > and over on xentrace). And that were the guest is stock: > > 0xa126: mov %eax,%cr0 > 0xa129: ljmp $0xf2e,$0xa12e > 0xa130: mov $0x26,%dl > 0xa132: or %bh,(%eax) > 0xa134: movzww %sp,%sp > 0xa138: mov %edx,%ds > 0xa13a: mov %edx,%es > 0xa13c: mov %edx,%fs > 0xa13e: mov %edx,%gs > 0xa140: jmp *%ebx > 0xa142: pushf > => 0xa143: lcall *%cs:(%si) > 0xa147: mov $0x0,%ch OOI what is the encoding of the bad instruction? > > Before trying on earlier version of Xen, I try to understand what when > wrong on the Xen side, it turn out that, in the track_dirty_vram > hypercall, a call to hap_enable_log_dirty() is all that needed to break > the guest. > > Jan, any idee of what the issue is? > > Regards, > _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.