[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [xen-unstable test] 13677: regressions - FAIL



On Wed, 2012-09-12 at 09:14 +0100, Jan Beulich wrote:
> >>> On 11.09.12 at 19:28, xen.org <ian.jackson@xxxxxxxxxxxxx> wrote:
> > flight 13677 xen-unstable real [real]
> > http://www.chiark.greenend.org.uk/~xensrcts/logs/13677/ 
> > 
> > Regressions :-(
> > 
> > Tests which did not succeed and are blocking,
> > including tests which could not be run:
> >  test-amd64-i386-rhel6hvm-amd  8 guest-stop                fail REGR. vs. 
> > 13668
> >  test-amd64-i386-qemuu-rhel6hvm-amd  8 guest-stop          fail REGR. vs. 
> > 13668
> >  test-amd64-i386-qemuu-rhel6hvm-intel  8 guest-stop        fail REGR. vs. 
> > 13668
> >  test-amd64-i386-rhel6hvm-intel  8 guest-stop              fail REGR. vs. 
> > 13668
> 
> While it seems very likely that if any, one of the two changes
> of mine under test here have caused this, looking through the
> logs I can't spot anything that would tell me what's wrong.
> According to var-log-xen-qemu-dm-redhat.guest.osstest.log,
> the guest went down (but it didn't even enter "dying" mode
> yet according to the diagnostic output in the serial log). Nor
> can I see any close relation between the behavior and the
> changsets under test...

Looking at test-amd64-i386-rhel6hvm-amd in flight 13675 which succeeded
the guest log ends:
        Halting system...
        type=1128 audit(1347348133.094:15071): user pid=1432 uid=0 
auid=4294967295 ses=4294967295 msg='init: exe="/sbin/reboot" hostname=? addr=? 
terminal=console res=success'
        md: stopping all md devices.
        xenbus_dev_shutdown: device/vkbd/0: Initialising != Connected, skipping
        xenbus_dev_shutdown: device/vbd/5632: Closed != Connected, skipping
        ACPI: Preparing to enter system sleep state S5
        Disabling non-boot CPUs ...
        Broke affinity for irq 4
        Broke affinity for irq 12
        SMP alternatives: switching to UP code
        Power down.
        shutdown requested in cpu_handle_ioreq
        Issued domain 2 poweroff
        
Whereas in the failing case it cuts off after "stopping all md devices".

13675 failed another sequence, lets assume for unrelated reasons. The
delta in the commits is just:

25844:0a9a4549e6b9 powernow: Update P-state directly when _PSD's CoordType is 
DOMAIN_COORD_TYPE_HW_ALL
25843:51090fe1ab97 x86/HVM: assorted RTC emulation adjustments

The first is a host level thing which I doubt would so consistently
effect HVM guests (and anyway, Intel tests are also failing). Which
pretty much leaves 25843:51090fe1ab97 or some weird heisenbug.

Is it outside the realms of possibility that the guest has managed to
limp along with the RTC being broken in some subtle way and only
eventually trips up when we come to shut down?

Looking back at 13675, is it possible that:

25842:a1f73e989c24 x86/hvm: don't give vector callback higher priority than 
NMI/MCE

is exposing a race in the guest or something? I very much doubt any NMI
or MCE are being injected at all though.

Ian.


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.