[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] xen-4.7 regression when saving a pv guest
On 25.08.2016 19:31, Juergen Gross wrote: > On 25/08/16 17:48, Stefan Bader wrote: >> When I try to save a PV guest with 4G of memory using xen-4.7 I get the >> following error: >> >> II: Guest memory 4096 MB >> II: Saving guest state to file... >> Saving to /tmp/pvguest.save new xl format (info 0x3/0x0/1131) >> xc: info: Saving domain 23, type x86 PV >> xc: error: Bad mfn in p2m_frame_list[0]: Internal error > > So the first mfn of the memory containing the p2m information is bogus. > Weird. Hm, not sure how bogus. From below the first mfn is 0x4eb1c8 and points to pfn=0xff7c8 which is above the current max of 0xbffff. But then the dmesg inside the guest said: "last_pfn = 0x100000" which would be larger than the pfn causing the error. > >> xc: error: mfn 0x4eb1c8, max 0x820000: Internal error >> xc: error: m2p[0x4eb1c8] = 0xff7c8, max_pfn 0xbffff: Internal error >> xc: error: Save failed (34 = Numerical result out of range): Internal error >> libxl: error: libxl_stream_write.c:355:libxl__xc_domain_save_done: saving >> domain: domain did not respond to suspend request: Numerical result out of >> range >> Failed to save domain, resuming domain >> xc: error: Dom 23 not suspended: (shutdown 0, reason 255): Internal error >> libxl: error: libxl_dom_suspend.c:460:libxl__domain_resume: xc_domain_resume >> failed for domain 23: Invalid argument >> EE: Guest not off after save! >> FAIL >> >> From dmesg inside the guest: >> [ 0.000000] e820: last_pfn = 0x100000 max_arch_pfn = 0x400000000 >> >> Somehow I am slightly suspicious about >> >> commit 91e204d37f44913913776d0a89279721694f8b32 >> libxc: try to find last used pfn when migrating >> >> since that seems to potentially lower ctx->x86_pv.max_pfn which is checked >> against in mfn_in_pseudophysmap(). Is that a known problem? >> With xen-4.6 and the same dom0/guest kernel version combination this does >> work. > > Can you please share some more information? Especially: > > - guest kernel version? Hm, apparently 4.4 and 4.6 with stable updates. I just tried a much older guest kernel (3.2) environment and that works. So it is the combination of switching from xen-4.6 to 4.7 and guest kernels running 4.4 and later. And while the exact mfn/pfn which gets dumped varies a little, the offending mapping always points to 0xffxxx which would be below last_pfn. Xen version 4.6 4.7 Guest Kernel 3.13.x ok ok 4.2.x ok ok 4.4.15 ok fail 4.6.7 ok fail I will try 4.7 and 4.8 based guest kernels with xen-4.7 in a bit, too. > - any patches in kernel not being upstream, especially in Xen-specific None I know of. > boot path? With affected kernels both direct kernel load and pvgrub. > - dmesg from guest with E820 map? From 4.4.x kernel: [ 0.000000] e820: BIOS-provided physical RAM map: [ 0.000000] e820: update [mem 0x00000000-0x00000fff] usable ==> reserved [ 0.000000] e820: remove [mem 0x000a0000-0x000fffff] usable [ 0.000000] e820: last_pfn = 0x100000 max_arch_pfn = 0x400000000 [ 0.000000] e820: cannot find a gap in the 32bit address range e820: PCI devices with unassigned 32bit BARs may break! [ 0.000000] e820: [mem 0x100100000-0x1004fffff] available for PCI devices Old 3.13 kernel (I see nothing different here): [ 0.000000] e820: BIOS-provided physical RAM map: [ 0.000000] e820: update [mem 0x00000000-0x00000fff] usable ==> reserved [ 0.000000] e820: remove [mem 0x000a0000-0x000fffff] usable [ 0.000000] e820: last_pfn = 0x100000 max_arch_pfn = 0x400000000 [ 0.000000] e820: cannot find a gap in the 32bit address range [ 0.000000] e820: PCI devices with unassigned 32bit BARs may break! [ 0.000000] e820: [mem 0x100100000-0x1004fffff] available for PCI devices > - guest configuration? Rather simple (some of it ls /for historic reasons, I also tried externally supplied kernel and initrd): name = "testpv" kernel = "/root/boot/pv-grub-hd0--x86_64.gz" memory = 4096 vcpus = 4 disk = [ 'file:/root/img/testpv.img,xvda1,w' ] vif = [ 'mac=xx:xx:xx:xx:xx:xx, bridge=br0' ] on_crash = "coredump-destroy" > > The same error would occur when trying to live migrate the guest. And > this has been tested a lot since above commit, so I suspect something > is very special in your case. > > > Juergen > Attachment:
signature.asc _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |