[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Arbitrary reboot with xen 3.4.x



Pasi Kärkkäinen a écrit :
On Thu, Nov 19, 2009 at 07:06:56PM +0100, Guillaume Rousse wrote:
Hello.

I've a dom0 working perfectly under xen 3.3.x, with a bout 15 HVM domU. When migrating to xen 3.4.1, with the same dom0 kernel (2.6.27.37), everything seems to be fine, I can launch the various hosts, but 5 to 10 minutes later, the host violently reboot... I can't find any trace in the logs. I do have a second host with the same configuration and setup, and the result is similar. It seems to be linked with domU activity, because without any domU, or without any domU with actual activity, I don't have any reboot. I had to rollback to xen 3.3.0.
Did you try the new Xen 3.4.2 ?
I just did this morning. Without any changelog, it's a bit 'upgrade and pray'...
It seems like an hardware issue (but it doesn't appears with 3.3.0), or a crash in the hypervisor, than syslog is unable to catch when it appears. How can I try to get a trace ?
You should setup a serial console, so you can capture and
log the full console (xen + dom0 kernel) output to other computer..
Indeed.

Here is the output. At first domU crash, because of memory ballooning issue, is not fatal. The second crash, however is. I don't know if it's because of uncorrect state after initial crash, or because of additional domUs launched in the interim.
(XEN) p2m_pod_demand_populate: Out of populate-on-demand memory! 

(XEN) domain_crash called from p2m.c:1091 

(XEN) Domain 1 (vcpu#0) crashed on cpu#3: 

(XEN) ----[ Xen-3.4.1  x86_64  debug=n  Not tainted ]---- 

(XEN) CPU:    3 

(XEN) RIP:    0010:[<ffffffff811ed7ed>] 

(XEN) RFLAGS: 0000000000010246   CONTEXT: hvm guest 

(XEN) rax: 00000000007028b8   rbx: 0000000000001000   rcx: 
0000000000000200
(XEN) rdx: 0000000000000000   rsi: 00000000007028b8   rdi: 
ffff8800123a0000
(XEN) rbp: ffff88001a119b68   rsp: ffff88001a119b50   r8: 
ffffea00003fcb00
(XEN) r9:  000000000001050f   r10: 0000000000000000   r11: 
0000000000000001
(XEN) r12: 0000000000001000   r13: 0000000000000000   r14: 
ffff88001796aea8
(XEN) r15: 0000000000001000   cr0: 000000008005003b   cr4: 
00000000000006f0
(XEN) cr3: 000000001a079000   cr2: 00007fc176c772e8 

(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0018   cs: 0010 

(XEN) p2m_pod_demand_populate: Out of populate-on-demand memory! 

(XEN) domain_crash called from p2m.c:1091 

(XEN) Domain 2 reported crashed by domain 0 on cpu#0: 

(XEN) p2m_pod_demand_populate: Out of populate-on-demand memory! 

(XEN) domain_crash called from p2m.c:1091 

(XEN) p2m_pod_demand_populate: Out of populate-on-demand memory! 

(XEN) domain_crash called from p2m.c:1091 

(XEN) p2m_pod_demand_populate: Out of populate-on-demand memory! 

(XEN) domain_crash called from p2m.c:1091 

(XEN) p2m_pod_demand_populate: Out of populate-on-demand memory! 

(XEN) domain_crash called from p2m.c:1091 

(XEN) p2m_pod_demand_populate: Out of populate-on-demand memory! 

(XEN) domain_crash called from p2m.c:1091 

(XEN) p2m_pod_demand_populate: Out of populate-on-demand memory! 

(XEN) domain_crash called from p2m.c:1091 

(XEN) p2m_pod_demand_populate: Out of populate-on-demand memory! 

(XEN) domain_crash called from p2m.c:1091 

(XEN) p2m_pod_demand_populate: Out of populate-on-demand memory! 

(XEN) domain_crash called from p2m.c:1091 

(XEN) p2m_pod_demand_populate: Out of populate-on-demand memory! 

(XEN) domain_crash called from p2m.c:1091 

(XEN) p2m_pod_demand_populate: Out of populate-on-demand memory! 

(XEN) domain_crash called from p2m.c:1091 

(XEN) ----[ Xen-3.4.1  x86_64  debug=n  Not tainted ]---- 

(XEN) CPU:    0 

(XEN) RIP:    e008:[<ffff828c801aab29>] hash_foreach+0x59/0xe0 

(XEN) RFLAGS: 0000000000010296   CONTEXT: hypervisor 

(XEN) rax: 0000000000000000   rbx: ffff8284000c1780   rcx: 
00000000000060bc
(XEN) rdx: ffff83041f98c000   rsi: 0000000000000336   rdi: 
ffff8300be7c0000
(XEN) rbp: 0000000000000336   rsp: ffff828c80257848   r8: 
0000000000200c00
(XEN) r9:  0000000000000001   r10: ffff83041f98c000   r11: 
ffff828c801b10e0
(XEN) r12: 0000000000000001   r13: 0000000000000000   r14: 
00000000000060bc
(XEN) r15: ffff828c80205f80   cr0: 000000008005003b   cr4: 
00000000000026f0
(XEN) cr3: 0000000021759000   cr2: 0000000000000000 

(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e010   cs: e008 

(XEN) Xen stack trace from rsp=ffff828c80257848: 

(XEN)    0000000000000000 ffff8300be7c0000 ffff83041f98c000 
ffff8284000c1780
(XEN)    ffff8300be7c0000 00000000000060bc 0000000000000000 
00000000000144bc
(XEN)    ffff8300be7c0000 ffff828c801aae4d ffff828c80257960 
00000000000060bc
(XEN)    ffff828c80257960 ffff83041f98c000 ffff83041f98c000 
ffff828c801b13bf
(XEN)    00000000000144bc 0000000000200c00 ffff83041f4ed5e0 
ffff83041f98d130
(XEN)    ffff828c80284d24 ffff83041f4ed5e0 ffff828c80257960 
ffff828c80257968
(XEN)    ffff83041f98c000 00000000000144bc 0000000000000000 
ffff828c801a96d4
(XEN)    0000000000000200 2000000000000000 ffff828c80257a80 
000000061f98c000
(XEN)    0000000000000200 007fffffffffffff 0000000000000000 
ffff83041f4ed000
(XEN)    000000000041f4ed 0000000000000001 0000000000000001 
0000000000000200
(XEN)    00000000000144bc ffff83041f98c000 0000000000000006 
ffff828c801a5991
(XEN)    ffff828c80257abc 0000000000000001 ffff828c80257ba8 
007fffffffffffff
(XEN)    ffff828c802579f0 ffff83041f98c000 ffff828c80257a80 
ffff828c801a6efb
(XEN)    0000000400000000 0000000000000000 ffff8300060bc000 
ffff8300060bb000
(XEN)    ffff8300060ba000 ffff8300060b9000 ffff8300060b8000 
ffff8300060b7000
(XEN)    ffff8300060b6000 ffff8300060b5000 ffff8300060b4000 
ffff8300060b3000
(XEN)    ffff8300060b2000 ffff8300060b1000 ffff8300060b0000 
ffff8300060af000
(XEN)    ffff8300060ae000 ffff828c801f16dc 0000000000000082 
0000000100000001
(XEN)    0000000100000001 0000000100000001 0000000100000001 
0000000100000001
(XEN)    0000000100000001 0000000100000001 0000000100000001 
0000000000000286
(XEN) Xen call trace: 

(XEN)    [<ffff828c801aab29>] hash_foreach+0x59/0xe0 

(XEN)    [<ffff828c801aae4d>] sh_remove_all_mappings+0x8d/0x200 

(XEN)    [<ffff828c801b13bf>] shadow_write_p2m_entry+0x2df/0x330 

(XEN)    [<ffff828c801a96d4>] p2m_set_entry+0x344/0x430 

(XEN)    [<ffff828c801a5991>] set_p2m_entry+0x71/0xa0 

(XEN)    [<ffff828c801a6efb>] p2m_pod_zero_check+0x1db/0x310 

(XEN)    [<ffff828c801a8a20>] p2m_pod_demand_populate+0x830/0xa40 

(XEN)    [<ffff828c801a90b4>] p2m_gfn_to_mfn+0x224/0x260 

(XEN)    [<ffff828c80151fd5>] mod_l1_entry+0x6e5/0x7b0 

(XEN)    [<ffff828c80153067>] do_mmu_update+0x937/0x16e0 

(XEN)    [<ffff828c8014df0b>] get_page_type+0xb/0x20 

(XEN)    [<ffff828c801112b4>] do_multicall+0x164/0x370 

(XEN)    [<ffff828c801c8169>] syscall_enter+0xa9/0xae 

(XEN) 

(XEN) Pagetable walk from 0000000000000000: 

(XEN)  L4[0x000] = 000000001cb48067 00000000003d6ca9 

(XEN)  L3[0x000] = 000000000c58b067 00000000003e72ec 

(XEN)  L2[0x000] = 0000000000000000 ffffffffffffffff 

(XEN) 

(XEN) **************************************** 

(XEN) Panic on CPU 0: 

(XEN) FATAL PAGE FAULT 

(XEN) [error_code=0000] 

(XEN) Faulting linear address: 0000000000000000 

(XEN) **************************************** 

(XEN) 

(XEN) Reboot in five seconds...


My domUs all have this configuration:
memory = 256
maxmem = 512

Or different values, but always with the same ratio between memory and max memory. Which seems to be quite useless for hvm domUs, as memory ballooning is not supported AFAIK, unless using pv-drivers (which I can't manage to build).
With identical values, the issue does'nt appear.

With Xen 3.4.2, the domUs still crash, but at least dom0 does not reboot. So it's just less worst :)
--
BOFH excuse #426:

internet is needed to catch the etherbunny

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.