[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] page faults on machines with > 4TB memory



Hi

While working on bugs during boot time on large oracle server x4-8,
There is a problem with booting Xen on large machines with > 4TB memory,
such as Oracle x4-8.
The page fault occured initially while loading xen pm info into hypervisor
(you can see it in serial log attahced named 4.4.2_no_mem_override).
Tracing down an issue shows that page fault occures in timer.c code
while getting heap size.

Here is the original call trace:
rocessor: Uploading Xen processor PM info 
@ (XEN) ----[ Xen-4.4.3-preOVM  x86_64  debug=n  Tainted:    C ]---- 
@ (XEN) CPU:    0 
@ (XEN) RIP:    e008:[<ffff82d08022e747>] add_entry+0x27/0x120 
@ (XEN) RFLAGS: 0000000000010082   CONTEXT: hypervisor 
@ (XEN) rax: ffff8a2d080513a20   rbx: ffff83808e802300   rcx:
00000000000000e8 
@ (XEN) rdx: 00000000000000e8   rsi: 00000000000000e8   rdi:
ffff83808e802300 
@ (XEN) rbp: ffff82d080513a20   rsp: ffff82d0804d7c70   r8:
ffff8840ffdb5010 
@ (XEN) r9:  0000000000000017   r10: ffff83808e802180   r11:
0200200200200200 
@ (XEN) r12: ffff82d080533080   r13: 0000000000000296   r14:
0100100100100100 
@ (XEN) r15: 00000000000000e8   cr0: 0000000080050033   cr4:
00000000001526f0 
@ (XEN) cr3: 00000100818b2000   cr2: ffff8840ffdb5010 
@ (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e010   cs: e008 
@ (XEN) Xen stack trace from rsp=ffff82d0804d7c70: 
@ (XEN)    ffff83808e802300 ffff82d080513a20 ffff82d08022f59b
ffff82d080533080 
@ (XEN)    ffff82d080532f50 00000000000000e8 ffff83808e802328
0000000000000000 
@ (XEN)    ffff82d080513a20 ffff83808e8022c0 ffff82d080533200
00000000000000e8 
@ (XEN)    00000000000000f0 ffff82d0805331c0 ffff82d0802458e2
0000000000000000 
@ (XEN)    00000000000000e8 ffff83808e802334 ffff8384be7979b0
ffff82d0804d7d78 
@ (XEN)    0000000000000000 ffff8384be77c700 ffff82d0804d7d78
ffff82d080513a20 
@ (XEN)    ffff82d080246207 00000000000000e8 00000000000000e8
ffff8384be7979b0 
@ (XEN)    ffff82d08024518a ffff82d080533080 0000000000000070
ffff82d080533da8 
@ (XEN)    00000001000000e8 ffff8384be797a00 000000e800000001
002ab980002abd68 
@ (XEN)    0000271000124f80 002abd6800124f80 00000000002ab980
ffff82d0803753e0 
@ (XEN)    0000000000010101 0000000000000001 ffff82d0804d7e18
ffff881fb4afbc88 
@ (XEN)    ffff82d0804d0000 ffff881fb28a4400 ffff82d0804fca80
ffffffff819b7080 
@ (XEN)    ffff82d080266c16 ffff83808fb46ba8 ffff82d080208a82
ffff83006bddd190 
@ (XEN)    0000000000000292 0300000100000036 00000001000000f6
000000000000000f 
@ (XEN)    0000007f000c0082 0000000000000000 0000007f000c0082
0000000000000000 
@ (XEN)    000000000000000a ffff881fb28a4400 0000000000000005
0000000000000000 
@ (XEN)    0000000000000000 00000000000000fe 0000000000000001
0000000000000001 
@ (XEN)    0000000000000000 0000000000000000 ffff82d08031f521
0000000000000000 
@ (XEN)    0000000000000246 ffffffff810010ea 0000000000000000
ffffffff810010ea 
@ (XEN)    000000000000e030 0000000000000246 ffff83006bddd000
ffff881fb4afbd48 
@ (XEN) Xen call trace: 
@ (XEN)    [<ffff82d08022e747>] add_entry+0x27/0x120 
@ (XEN)    [<ffff82d08022f59b>] set_timer+0x10b/0x220 
@ (XEN)    [<ffff82d0802458e2>] cpufreq_governor_dbs+0x1e2/0x2f0 
@ (XEN)    [<ffff82d080246207>] __cpufreq_set_policy+0x87/0x120 
@ (XEN)    [<ffff82d08024518a>] cpufreq_add_cpu+0x24a/0x4f0 
@ (XEN)    [<ffff82d080266c16>] do_platform_op+0x9c6/0x1650 
@ (XEN)    [<ffff82d080208a82>] evtchn_check_pollers+0x22/0xb0 
@ (XEN)    [<ffff82d08031f521>] do_iret+0xc1/0x1a0 
@ (XEN)    [<ffff82d0803243a9>] syscall_enter+0xa9/0xae 
@ (XEN) 
@ (XEN) Pagetable walk from ffff8840ffdb5010: 
@ (XEN)  L4[0x110] = 00000100818b3067 00000000000018b3 
@ (XEN)  L3[0x103] = 0000000000000000 ffffffffffffffff 
@ (XEN) 
@ (XEN) ****************************************

0xffff82d08022e720 <add_entry>: movzwl 0x28(%rdi),%edx
   0xffff82d08022e724 <add_entry+4>:    push   %rbp
   0xffff82d08022e725 <add_entry+5>:    
    lea    0x2e52f4(%rip),%rax        # 0xffff82d080513a20 <__per_cpu_offset>
   0xffff82d08022e72c <add_entry+12>:   
    lea    0x30494d(%rip),%r10        # 0xffff82d080533080 <per_cpu__timers>
   0xffff82d08022e733 <add_entry+19>:   push   %rbx
   0xffff82d08022e734 <add_entry+20>:   add    (%rax,%rdx,8),%r10
   0xffff82d08022e738 <add_entry+24>:   movl   $0x0,0x8(%rdi)
   0xffff82d08022e73f <add_entry+31>:   movb   $0x3,0x2a(%rdi)
   0xffff82d08022e743 <add_entry+35>:   mov    0x8(%r10),%r8
   0xffff82d08022e747 <add_entry+39>:   movzwl (%r8),%ecx

And this points to 
int sz = GET_HEAP_SIZE(heap);
in add_entry of timer.c.

static int add_entry(struct timer *t)                                           
{                                                                               
ffff82d08022cad3:   53                      push   %rbx                         
    struct timers *timers = &per_cpu(timers, t->cpu);                           
ffff82d08022cad4:   4c 03 14 d0             add    (%rax,%rdx,8),%r10           
    int rc;                                                                     
                                                                                
    ASSERT(t->status == TIMER_STATUS_invalid);                                  
                                                                                
    /* Try to add to heap. t->heap_offset indicates whether we succeed. */      
    t->heap_offset = 0;                                                         
ffff82d08022cad8:   c7 47 08 00 00 00 00    movl   $0x0,0x8(%rdi)               
    t->status = TIMER_STATUS_in_heap;                                           
ffff82d08022cadf:   c6 47 2a 03             movb   $0x3,0x2a(%rdi)              
    rc = add_to_heap(timers->heap, t);                                          
ffff82d08022cae3:   4d 8b 42 08             mov    0x8(%r10),%r8                
                                                                                
                                                                                
/* Add new entry @t to @heap. Return TRUE if new top of heap. */                
static int add_to_heap(struct timer **heap, struct timer *t)                    
{                                                                               
    int sz = GET_HEAP_SIZE(heap);                                               
ffff82d08022cae7:   41 0f b7 08             movzwl (%r8),%ecx                   
                                                                                
    /* Fail if the heap is full. */                                             
    if ( unlikely(sz == GET_HEAP_LIMIT(heap)) )    

But checking values for nr_cpumask_bits, nr_cpu_ids and NR_CPUS did not
provide any clues on why it fails here.

After disabling xen cpufreq in linux, the page fault did not appear, but
creating new guest caused another fatal page fault:

CPU:    0 
@ (XEN) RIP:    e008:[<ffff82d08025d59b>] __find_first_bit+0xb/0x30 
@ (XEN) RFLAGS: 0000000000010246   CONTEXT: hypervisor 
@ (XEN) rax: 0000000000000000   rbx: 00000000ffdb53c0   rcx: 0000000000000004 
@ (XEN) rdx: ffff82d080513a20   rsi: 00000000000000f0   rdi: ffff8840ffdb53c0 
@ (XEN) rbp: 00000000000000e9   rsp: ffff82d0804d7d88   r8:  0000000000000000 
@ (XEN) r9:  0000000000000000   r10: 0000000000000017   r11: 0000000000000000 
@ (XEN) r12: ffff8381875ee3e0   r13: ffff82d0804d7e98   r14: 00000000000000e9 
@ (XEN) r15: 00000000000000f0   cr0: 0000000080050033   cr4: 00000000001526f0 
@ (XEN) cr3: 0000008174093000   cr2: ffff8840ffdb53c0 
@ (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e010   cs: e008 
@ (XEN) Xen stack trace from rsp=ffff82d0804d7d88: 
@ (XEN)    00000000000000e7 ffff82d080206030 000000cf7d47d0a2 00000000000000e9 
@ (XEN)    00000000000000f0 0000000000000002 ffff83808fb6ffd0 ffff82d080533db8 
@ (XEN)    0000000000000000 ffff82d080532f50 ffff82d0804d0000 ffff82d080533db8 
@ (XEN)    00007fa8c83e5004 ffff82d0804d7e08 ffff82d080533db8 ffff83818b4e5000 
@ (XEN)    000000090000000f 00007fa8c8390001 00007fa800000002 00007fa8ae7f8eb8 
@ (XEN)    0000000000000002 00007fa898004170 000000000159c320 00000034ccc6cffe 
@ (XEN)    00007fa8c83e5000 0000000000000000 000000000159c320 fffffc73ffffffff 
@ (XEN)    00000034ccf6e920 00000034ccf6e920 00000034ccf6e920 00000034ccc94298 
@ (XEN)    00007fa898004170 00000034ccc94220 ffffffffffffffff ffffffffffffffff 
@ (XEN)    ffffffffffffffff 000000ffffffffff 00000034ca0e08c7 0000000000000100 
@ (XEN)    00000034ca0e08c7 0000000000000033 0000000000000246 ffff83006bddd000 
@ (XEN)    ffff8808456f1e98 00007fa8ae7f8d90 ffff88084ad1d900 0000000000000001 
@ (XEN)    00007fa8ae7f8d90 ffff82d0803243a9 00000000ffffffff 0000000001d0085c 
@ (XEN)    00007fa8c84549c0 00007fa898004170 ffff8808456f1e98 00007fa8ae7f8d90 
@ (XEN)    0000000000000282 00000000019c9998 0000000000000003 0000000001d00a49 
@ (XEN)    0000000000000024 ffffffff8100148a 00007fa898004170 00007fa8ae7f8ed0 
@ (XEN)    00007fa8c83e5004 0001010000000000 ffffffff8100148a 000000000000e033 
@ (XEN)    0000000000000282 ffff8808456f1e40 000000000000e02b 0000000000000000 
@ (XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000 
@ (XEN)    ffff83006bddd000 0000000000000000 0000000000000000 
@ (XEN) Xen call trace: 
@ (XEN)    [<ffff82d08025d59b>] __find_first_bit+0xb/0x30 
@ (XEN)    [<ffff82d080206030>] do_domctl+0x12b0/0x13d0 
@ (XEN)    [<ffff82d0803243a9>] syscall_enter+0xa9/0xae 
@ (XEN) 
@ (XEN) Pagetable walk from ffff8840ffdb53c0: 
@ (XEN)  L4[0x110] = 00000080818b3067 00000000000018b3

While booting upstream on the same server (same command line as in other cases)
causes another page fault (see attaches upstream_no_mem_override.log);

We remembered there there is another open bug about a problem when starting 
with more than 4 TB memory. The workaround for this was to override mem at Xen 
command line. Tried this, and with upstream Xen and one that 4.4.3 with enabled 
cpufreq linux driver, problem dissapears. See attached logs 
upstream_with_mem_override.log and 4.4.3_with_mem_overrride.log.

Any information on what can be an issue here or any other pointers will be very 
helpful.
I will provide additional info if needed.

Thank you
Elena

Attachment: 4.4.3_no_mem_override.log
Description: Text document

Attachment: 4.4.3_with_mem_overrride.log
Description: Text document

Attachment: upstream_no_mem_override.log
Description: Text document

Attachment: upstream_with_mem_override.log
Description: Text document

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.