[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Intermittent fatal page fault with XEN 4.3.1 (Centos 6.3 DOM0 with linux kernel 3.10.16.)



On 05/11/2013 22:46, Jeff_Zimmerman@xxxxxxxxxx wrote:
Asit,
I've attached two files, one is from dmesg | grep microcode, second is first process from /proc/cpuinfo
Jeff

On Nov 5, 2013, at 2:29 PM, "Mallick, Asit K" <asit.k.mallick@xxxxxxxxx>
 wrote:

> Jeff,
> Could you check if you you have latest microcode updates installed on this system? Or, could you send me the microcode rev and I can check.
>
> Thanks,
> Asit
>
>
> From: "Jeff_Zimmerman@xxxxxxxxxx<mailto:Jeff_Zimmerman@xxxxxxxxxx>" <Jeff_Zimmerman@xxxxxxxxxx<mailto:Jeff_Zimmerman@xxxxxxxxxx>>
> Date: Tuesday, November 5, 2013 2:55 PM
> To: "lars.kurth@xxxxxxx<mailto:lars.kurth@xxxxxxx>" <lars.kurth@xxxxxxx<mailto:lars.kurth@xxxxxxx>>
> Cc: "lars.kurth.xen@xxxxxxxxx<mailto:lars.kurth.xen@xxxxxxxxx>" <lars.kurth.xen@xxxxxxxxx<mailto:lars.kurth.xen@xxxxxxxxx>>, "xen-devel@xxxxxxxxxxxxxxxxxxxx<mailto:xen-devel@xxxxxxxxxxxxxxxxxxxx>" <xen-devel@xxxxxxxxxxxxxxxxxxxx<mailto:xen-devel@xxxxxxxxxxxxxxxxxxxx>>, "JBeulich@xxxxxxxx<mailto:JBeulich@xxxxxxxx>" <JBeulich@xxxxxxxx<mailto:JBeulich@xxxxxxxx>>
> Subject: Re: [Xen-devel] Intermittent fatal page fault with XEN 4.3.1 (Centos 6.3 DOM0 with linux kernel 3.10.16.)
>
> Lars,
> I understand the mailing list limits attachment size to 512K. Where can I post the xen binary an symbols file?
> Jeff
>
> On Nov 5, 2013, at 7:46 AM, Lars Kurth <lars.kurth@xxxxxxx<mailto:lars.kurth@xxxxxxx>> wrote:
>
> Jan, Andrew, Ian,
>
> pulling in Jeff who raised the question. Snippets from misc replies attached. Jeff, please look through these (in particular Jan's answer) and answer any further questions on this thread.
>
> On 05/11/2013 09:53, Ian Campbell wrote:
>> TBH I think for this kind of thing (i.e. a bug not a user question) the most appropriate thing to
>> do would be to redirect them to xen-devel themselves (with a reminder that they do not need
>> to subscribe to post).
> Agreed. Another option is for me to start the thread and pull in the raiser of the thread into it, if it is a bug. Was not sure this was a real bug at first, but it seems it is.
>
> On 04/11/2013 20:00, Andrew Cooper wrote:
>> Which version of Xen were these images saved on?
> [Jeff] We were careful to regenerate all the images after upgrading the 4.3.1. Also saw the same problem on 4.3.0.
>
>> Are you expecting to be using nested-virt? (It is still very definitely experimental)
> [Jeff] Not using nested-virt.
>
> On 05/11/2013 10:04, Jan Beulich wrote:
>
> On 04.11.13 at 20:54, Lars Kurth <lars.kurth.xen@xxxxxxxxx><mailto:lars.kurth.xen@xxxxxxxxx> wrote:
>
>
> See
> http://xenproject.org/help/questions-and-answers/hypervisor-fatal-page-fault-xen-4-3-
> 1.html
> ---
> I have a 32 core system running XEN 4.3.1 with 30 Windows XP VM's.
> DOM0 is Centos 6.3 based with linux kernel 3.10.16.
> In my configuration all of the windows HVMs are running having been
> restored from xl save.
> VM's are destroyed or restored in an on-demand fashion. After some time XEN
> will experience a fatal page fault while restoring one of the windows HVM
> subjects. This does not happen very often, perhaps once in a 16 to 48 hour
> period.
> The stack trace from xen follows. Thanks in advance for any help.
>
> (XEN) ----[ Xen-4.3.1 x86_64 debug=n Tainted: C ]----
> (XEN) CPU: 52
> (XEN) RIP: e008:[] domain_page_map_to_mfn+0x86/0xc0
>
>
> Zapping addresses (here and below in the stack trace) is never
> helpful when someone asks for help with a crash. Also, in order
> to not just guess, the matching xen-syms or xen.efi should be
> made available or pointed to.
>
>
>
> (XEN) RFLAGS: 0000000000010246 CONTEXT: hypervisor
> (XEN) rax: 000ffffffffff000 rbx: ffff8300bb163760 rcx: 0000000000000000
> (XEN) rdx: ffff810000000000 rsi: 0000000000000000 rdi: 0000000000000000
> (XEN) rbp: ffff8300bb163000 rsp: ffff8310333e7cd8 r8: 0000000000000000
> (XEN) r9: 0000000000000000 r10: 0000000000000000 r11: 0000000000000000
> (XEN) r12: ffff8310333e7f18 r13: 0000000000000000 r14: 0000000000000000
> (XEN) r15: 0000000000000000 cr0: 0000000080050033 cr4: 00000000000426f0
> (XEN) cr3: 000000211bee5000 cr2: ffff810000000000
> (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: 0000 cs: e008
> (XEN) Xen stack trace from rsp=ffff8310333e7cd8:
> (XEN) 0000000000000001 ffff82c4c01de869 ffff82c4c0182c70 ffff8300bb163000
> (XEN) 0000000000000014 ffff8310333e7f18 0000000000000000 ffff82c4c01d7548
> (XEN) ffff8300bb163490 ffff8300bb163000 ffff82c4c01c65b8 ffff8310333e7e60
> (XEN) ffff82c4c01badef ffff8300bb163000 0000000000000003 ffff833144d8e000
> (XEN) ffff82c4c01b4885 ffff8300bb163000 ffff8300bb163000 ffff8300bdff1000
> (XEN) 0000000000000001 ffff82c4c02f2880 ffff82c4c02f2880 ffff82c4c0308440
> (XEN) ffff82c4c01d0ea8 ffff8300bb163000 ffff82c4c015ad6c ffff82c4c02f2880
> (XEN) ffff82c4c02cf800 00000000ffffffff ffff8310333f5060 ffff82c4c02f2880
> (XEN) 0000000000000282 0010000000000000 0000000000000000 0000000000000000
> (XEN) 0000000000000000 ffff82c4c02f2880 ffff8300bdff1000 ffff8300bb163000
> (XEN) 000031a10f2b16ca 0000000000000001 ffff82c4c02f2880 ffff82c4c0308440
> (XEN) ffff82c4c0124444 0000000000000034 ffff8310333f5060 0000000001c9c380
> (XEN) 00000000c0155965 ffff82c4c01c6146 0000000001c9c380 ffffffffffffff00
> (XEN) ffff82c4c0128fa8 ffff8300bb163000 ffff8327d50e9000 ffff82c4c01bc490
> (XEN) 0000000000000000 ffff82c4c01dd254 0000000080549ae0 ffff82c4c01cfc3c
> (XEN) ffff8300bb163000 ffff82c4c01d6128 ffff82c4c0125db9 ffff82c4c0125db9
> (XEN) ffff8310333e0000 ffff8300bb163000 000000000012ffc0 0000000000000000
> (XEN) 0000000000000000 0000000000000000 0000000000000000 ffff82c4c01deaa3
> (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> (XEN) 000000000012ffc0 000000007ffdf000 0000000000000000 0000000000000000
> (XEN) Xen call trace:
> (XEN) [] domain_page_map_to_mfn+0x86/0xc0
> (XEN) [] nvmx_handle_vmlaunch+0x49/0x160
> (XEN) [] __update_vcpu_system_time+0x240/0x310
> (XEN) [] vmx_vmexit_handler+0xb58/0x18c0
> (XEN) [] pt_restore_timer+0xa8/0xc0
> (XEN) [] hvm_io_assist+0xef/0x120
> (XEN) [] hvm_do_resume+0x195/0x1c0
> (XEN) [] vmx_do_resume+0x148/0x210
> (XEN) [] context_switch+0x1bc/0xfc0
> (XEN) [] schedule+0x254/0x5f0
> (XEN) [] pt_update_irq+0x256/0x2b0
> (XEN) [] timer_softirq_action+0x168/0x210
> (XEN) [] hvm_vcpu_has_pending_irq+0x50/0xb0
> (XEN) [] nvmx_switch_guest+0x54/0x1560
> (XEN) [] vmx_intr_assist+0x6c/0x490
> (XEN) [] vmx_vmenter_helper+0x88/0x160
> (XEN) [] __do_softirq+0x69/0xa0
> (XEN) [] __do_softirq+0x69/0xa0
> (XEN) [] vmx_asm_do_vmentry+0/0xed
> (XEN)
> (XEN) Pagetable walk from ffff810000000000:
> (XEN) L4[0x102] = 000000211bee5063 ffffffffffffffff
> (XEN) L3[0x000] = 0000000000000000 ffffffffffffffff
>
>
> This makes me suspect that domain_page_map_to_mfn() gets a
> NULL pointer passed here. As said above, this is only guesswork
> at this point, and as Ian already pointed out, directing the
> reporter to xen-devel would seem to be the right thing to do
> here anyway.
>
> Jan
>
>
>


As Jan said, the above censoring is almost completely defeating the purpose of trying to help you.

However, while you are not expecting to be using nested-virt, you clearly appear to be from the stack trace, so something is clearly up.

Which toolstack are you using for VMs ?  What is the configuration for the affected VM?

~Andrew
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.