[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] xen-4.7 regression when saving a pv guest



On Fri, Aug 26, 2016 at 02:55:06PM +0200, Stefan Bader wrote:
> On 26.08.2016 13:53, Juergen Gross wrote:
> > On 26/08/16 12:52, Stefan Bader wrote:
> >> On 25.08.2016 19:31, Juergen Gross wrote:
> >>> On 25/08/16 17:48, Stefan Bader wrote:
> >>>> When I try to save a PV guest with 4G of memory using xen-4.7 I get the
> >>>> following error:
> >>>>
> >>>> II: Guest memory 4096 MB
> >>>> II: Saving guest state to file...
> >>>> Saving to /tmp/pvguest.save new xl format (info 0x3/0x0/1131)
> >>>> xc: info: Saving domain 23, type x86 PV
> >>>> xc: error: Bad mfn in p2m_frame_list[0]: Internal error
> >>>
> >>> So the first mfn of the memory containing the p2m information is bogus.
> >>> Weird.
> >>
> >> Hm, not sure how bogus. From below the first mfn is 0x4eb1c8 and points to
> >> pfn=0xff7c8 which is above the current max of 0xbffff. But then the dmesg 
> >> inside
> >> the guest said: "last_pfn = 0x100000" which would be larger than the pfn 
> >> causing
> >> the error.
> >>
> >>>
> >>>> xc: error: mfn 0x4eb1c8, max 0x820000: Internal error
> >>>> xc: error:   m2p[0x4eb1c8] = 0xff7c8, max_pfn 0xbffff: Internal error
> >>>> xc: error: Save failed (34 = Numerical result out of range): Internal 
> >>>> error
> >>>> libxl: error: libxl_stream_write.c:355:libxl__xc_domain_save_done: saving
> >>>> domain: domain did not respond to suspend request: Numerical result out 
> >>>> of range
> >>>> Failed to save domain, resuming domain
> >>>> xc: error: Dom 23 not suspended: (shutdown 0, reason 255): Internal error
> >>>> libxl: error: libxl_dom_suspend.c:460:libxl__domain_resume: 
> >>>> xc_domain_resume
> >>>> failed for domain 23: Invalid argument
> >>>> EE: Guest not off after save!
> >>>> FAIL
> >>>>
> >>>> From dmesg inside the guest:
> >>>> [    0.000000] e820: last_pfn = 0x100000 max_arch_pfn = 0x400000000
> >>>>
> >>>> Somehow I am slightly suspicious about
> >>>>
> >>>> commit 91e204d37f44913913776d0a89279721694f8b32
> >>>>   libxc: try to find last used pfn when migrating
> >>>>
> >>>> since that seems to potentially lower ctx->x86_pv.max_pfn which is 
> >>>> checked
> >>>> against in mfn_in_pseudophysmap(). Is that a known problem?
> >>>> With xen-4.6 and the same dom0/guest kernel version combination this 
> >>>> does work.
> >>>
> >>> Can you please share some more information? Especially:
> >>>
> >>> - guest kernel version?
> >> Hm, apparently 4.4 and 4.6 with stable updates. I just tried a much older 
> >> guest
> >> kernel (3.2) environment and that works. So it is the combination of 
> >> switching
> >> from xen-4.6 to 4.7 and guest kernels running 4.4 and later. And while the 
> >> exact
> >> mfn/pfn which gets dumped varies a little, the offending mapping always 
> >> points
> >> to 0xffxxx which would be below last_pfn.
> > 
> > Aah, okay. The problem seems to be specific to the linear p2m list
> > handling.
> > 
> > Trying on my system... Yep, seeing your problem, too.
> > 
> > Weird that nobody else stumbled over it.
> > Ian, don't we have any test in OSSTEST which should catch this problem?
> > A 4GB 64-bit pv-domain with Linux kernel 4.3 or newer can't be saved
> > currently.
> > 
> > Following upstream patch fixes it for me:
> 
> Ah! :) Thanks. I applied the below locally, too. And save works with a 4.6 
> guest
> kernel.
> 

I'm going to translate this into a Tested-by tag in the proper patch [0].

Wei.

[0] <1472212735-27445-1-git-send-email-jgross@xxxxxxxx>

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.