[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Xen 4.5 Development Update (RC4)

On 16/12/14 20:49, Konrad Rzeszutek Wilk wrote:
> On Tue, Dec 16, 2014 at 05:43:08PM +0000, Andrew Cooper wrote:
>> On 16/12/14 16:13, konrad.wilk@xxxxxxxxxx wrote:
>>> Xen 4.5-rc4 was out on Monday (Dec 15th). This is the last RC and then
>>> we have the General Release on Jan 7th!
>>> Details for the test-day are at
>>> http://wiki.xen.org/wiki/Xen_4.5_RC4_test_instructions
>>> In terms of bugs, we have:
>> From the XenServer testing.
> Thank you for doing this testing!
>> * Fail to reliably boot on IBM Flex x222 blades, apparent regression
>> from 4.4
>> I have declared this a latent BIOS bug, and not a regression from 4.4. 
>> Across regular reboots, the exact positions of the ACPI tables, and the
>> e820 layout is unstable.  The first consistent difference between 4.4
>> and 4.5 is that 4.4 reports 1 MBR signature while 4.5 reports 0.  This
>> is because the int $0x13, ah=2 call is returning differently.  I can get
>> the call to return differently (and correctly for 4.5) by simply making
>> the boot trampoline larger (with my debugging routines but not being
>> called).
> This sounds very familiar, but I can't place where I saw mention of
> a similar issue.
>> * VM fail to resume on upgrade from Xen < 4.5
>> This is the issue I am currently looking into.  Currently, all the
>> "upgrade from older XenServer" tests are failing due to VMs crashing on
>> resume.  I have not yet identified whether this is a XenServer issue or
> Ugh.

I have got to the bottom of this, and it it turns out to be a legacy ->
migration v2 conversion bug which only surfaced now because Xen-4.5 is
more strict than Xen-4.4.

HVM_PARAM_PAE_ENABLED is sent out-of-band in legacy, but passed to
xc_domain_restore(), which does a set_param(), unconnected with any
contents of the stream.  Migration v2 saves and restores it properly,
but the legacy -> v2 conversion neglected to combine the out-of-band
information.  No VMs blew up because all versions of Xen at that point
were not correctly auditing updates to cr4 against the domain cpuid
policy.  Xen-4.5 now does, causing #GP faults on cr4 writes for guests
which had PAE enabled before migrate.

I shall be fixing this in the migration v2 series, and also looking for
any other obvious out-of-band information which needs injecting into a
converted stream.

With this fixed(^W hacked around for now), I have identified and solved
all discrepancies XenServer testing has noticed between Xen-4.4 and
Xen-4.5 so far.

There will be another full nightly test happening tonight (based on c/s
7e88c23 "libxl: Tell qemu to use raw format when using a tapdisk"), and
some stress and scale tests if time allows.


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.