[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Bug on shadow page mode



> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@xxxxxxxx]
> Sent: Tuesday, April 02, 2013 3:38 PM
> To: Hao, Xudong
> Cc: xen-devel (xen-devel@xxxxxxxxxxxxx)
> Subject: Re: Bug on shadow page mode
> 
> >>> On 29.03.13 at 07:39, "Hao, Xudong" <xudong.hao@xxxxxxxxx> wrote:
> > There is a bug with booting 3 guest with no-EPT mode, when xen handle
> guest
> > page fault will walk shadow guest table, fail to get l4e from the top level
> > table, then it will trigger a Fatal Page Fault and panic Xen.
> >
> > Looked at this issue and found the bug is brought by this patch
> >     changeset 26523:fd997a96d448
> >     x86: debugging code for testing 16Tb support on smaller memory
> systems
> >
> > I'm not much clear what's the reason of the modification(#ifdef NDEBUG) in
> > xen/arch/x86/domain_page.c of this patch?
> 
> The point is to make sure the domain page mapping code actually
> gets tested. The shortcut is a performance optimization.
> 
So in a smaller memory system, original code would run into the shortcut. Now 
if NDEBUG not be defined, it will skip the mfn_to_virt(mfn) returning and run 
code below it.

Our case panic xen in this condition. We have 2G physical memory system, run 3 
RHEL6u3 guests with shadow page mode, each guest allocate 400MB memory and set 
dom0 512M memory, when the 3nd guest booting, xen will panic on walking shadow 
page table.

> > But removing the macro limiting will solve the fatal page fault bug for
> > shadow page mode, can it be simply removed? or can you look at it because
> you
> > are very familiar with it.
> 
> While it could be removed, this would be a fix for the problem you're
> seeing - you'd only later see someone else run into it on a system
> with more than 5Tb.
> 
> Hence we need you to provide technical details of the crash you
> observed (register and stack dump as well as call trace, plus any
> details on the specific guest you're running that distinguishes it
> from other guests that don't exhibit this problem).
> 

(XEN) ----[ Xen-4.3-unstable  x86_64  debug=y  Tainted:    C ]----
(XEN) CPU:    4
(XEN) RIP:    e008:[<ffff82c4c01e637f>] guest_walk_tables_4_levels+0x135/0x6a6
(XEN) RFLAGS: 0000000000010286   CONTEXT: hypervisor
(XEN) rax: 0000000000000001   rbx: ffff83007f087c98   rcx: 0000000000000005
(XEN) rdx: 0000000000000002   rsi: 0000000000800000   rdi: ffff83007f087ce0
(XEN) rbp: ffff83007f087ab8   rsp: ffff83007f087a38   r8:  0000000000000004
(XEN) r9:  000000000002b650   r10: 0000000000000022   r11: 0000000000000206
(XEN) r12: ffff830047374000   r13: 0000003a0f388718   r14: ffff82c4c025f90c
(XEN) r15: ffff82c406a00000   cr0: 000000008005003b   cr4: 00000000000426f0
(XEN) cr3: 000000004767e000   cr2: ffff82c406a00000
(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
(XEN) Xen stack trace from rsp=ffff83007f087a38:
(XEN)    000000007f087a88 ffff8300477df000 800000003e604025 000000007f087b10
(XEN)    ffff83007f087ac8 ffff8300477c4820 0000000528dd40f8 0080000000000004
(XEN)    ffff830028dd4ff8 ffff830000000000 ffff83007f087b48 ffff830047374000
(XEN)    ffff8300477df000 ffff83007f087f18 0000000000000000 000000000000000e
(XEN)    ffff83007f087d18 ffff82c4c020d8cc ffff82c406a00000 ffff83007f087b18
(XEN)    ffff83007f080000 ffff82c4c0311cc8 0000000000000c40 ffff83007f080000
(XEN)    ffff82c4c0311cc8 00000000000003c8 0000000000000740 0000000000000000
(XEN)    0000000000000001 0000000000028dd4 ffff8300477dfb58 0000000000000108
(XEN)    ffff8300477dfb58 ffff83007f080000 ffff82c4c0311cc8 ffff82c4c0311cd8
(XEN)    0000000003a0f388 ffff82c4c011759c ffff8300477dfae8 ffff830047374980
(XEN)    ffff8300477df000 000000000000652f 0000003a0f388718 000000000002b650
(XEN)    ffff83007f087c18 ffff82c4c01ef4d7 ffff83007f087be0 ffff83007f080000
(XEN)    ffff83007f087c18 0000000100000100 ffff83007f087c18 ffff830047374000
(XEN)    0000000000006a00 000000370002b650 000000000002b650 ffff830047374000
(XEN)    ffff830047374980 0000000000000000 ffff83007f087c18 ffff82c4c0125e3d
(XEN)    ffff83007f087c78 ffff82c4c021017d 0c00000000000000 ffff8300477df000
(XEN)    ffff82e0008e8640 ffff830047374000 0000000013650000 0000000013650000
(XEN)    0000000000000000 0000000000000000 ffff83007f087c78 ffff82c4c01ae218
(XEN)    ffff83007f087cc8 ffff82c4c01b4abc 0000000028dd4027 ffff82c4c0207f86
(XEN)    0000003a0f388718 0000000000000000 0000000000000000 0000000000000000
(XEN) Xen call trace:
(XEN)    [<ffff82c4c01e637f>] guest_walk_tables_4_levels+0x135/0x6a6
(XEN)    [<ffff82c4c020d8cc>] sh_page_fault__guest_4+0x505/0x2015
(XEN)    [<ffff82c4c01d2135>] vmx_vmexit_handler+0x86c/0x1748
(XEN)    
(XEN) Pagetable walk from ffff82c406a00000:
(XEN)  L4[0x105] = 000000007f26e063 ffffffffffffffff
(XEN)  L3[0x110] = 000000005ce30063 ffffffffffffffff
(XEN)  L2[0x035] = 0000000014aab063 ffffffffffffffff 
(XEN)  L1[0x000] = 0000000000000000 ffffffffffffffff
(XEN) 
(XEN) ****************************************
(XEN) Panic on CPU 4:
(XEN) FATAL PAGE FAULT
(XEN) [error_code=0000]
(XEN) Faulting linear address: ffff82c406a00000
(XEN) ****************************************
(XEN) 
(XEN) Reboot in five seconds...
(XEN) Resetting with ACPI MEMORY or I/O RESET_REG
> Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.