[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Xen-unstable panic: FATAL PAGE FAULT
Do you have a line in Xen boot output that starts "PFN compression on bits"? If so what does it say? My suspicion is that Jan Beulich's patches to implement a consolidated page array for sparse memory maps has broken the assumption in some Xen code that: page_to_mfn(mfn_to_page(x)+y) == x+y, for all valid mfns x, and all y up to some pretty big limit. Looking in free_heap_pages() I see we do a whole bunch of chunk merging in our buddy allocator, doing arithmetic on variable 'pg' to find neigbouring chunks. It's a bit dodgy I suspect. I'm cc'ing Jan to see what we can get away with in doing arithmetic on page_info pointers. What's the guaranteed smallest aligned contiguous ranges of mfn in the frame_table now, Jan? (i.e., ranges in which adjacent page_info structs relate to adjacent MFNs) If this is the problem I'm pretty sure we can come up with a patch quite easily, but depending on the answer to my above question to Jan, we may need to do some code auditing. -- Keir On 31/08/2010 14:49, "MaoXiaoyun" <tinnycloud@xxxxxxxxxxx> wrote: > Hi Keir: > > Thank you for correcting my mistakes. > Here is the lastest panic and its objdump. > I am not familiar with assemble language and those regigsters usage. > I will try to spend some other time to get more understandings. > What's your opionion? > btw, the memtest is still running, so far so good, thanks. > > ------------------objdump----------------------------------------------------- > ------------------- > 177 ffff82c480115396:<++48 c1 e1 04 <++shl $0x4,%rcx > 178 ffff82c48011539a:<++4a 03 0c f8 <++add (%rax,%r15,8),%rcx > 179 } > 180 static inline void > 181 page_list_del(struct page_info *page, struct page_list_head *head) > 182 { > 183 struct page_info *next = pdx_to_page(page->list.next); > 184 ffff82c48011539e:<++8b 03 <++mov (%rbx),%eax > 185 ffff82c4801153a0:<++48 c1 e0 05 <++shl $0x5,%rax > 186 ffff82c4801153a4:<++48 29 e8 <++sub %rbp,%rax 187 > ffff82c4801153a7:<++48 3b 19 <++cmp (%rcx),%rbx > 188 ffff82c4801153aa:<++0f 84 95 01 00 00 <++je ffff82c480115545 > <free_heap_pages+0x405> > 189 struct page_info *prev = pdx_to_page(page->list.prev); > 190 ffff82c4801153b0:<++89 f2 <++mov %esi,%edx > 191 ffff82c4801153b2:<++48 c1 e2 05 <++shl $0x5,%rdx > 192 ffff82c4801153b6:<++48 29 ea <++sub %rbp,%rdx > 193 ffff82c4801153b9:<++48 3b 59 08 <++cmp &nbs p; 0x8(%rcx),%rbx > 194 ffff82c4801153bd:<++0f 84 bd 01 00 00 <++je ffff82c480115580 > <free_heap_pages+0x440> > 195 > 196 if ( !__page_list_del_head(page, head, next, prev) ) > 197 { > 198 next->list.prev = page->list.prev; > 199 ffff82c4801153c3:<++89 70 04 <++mov %esi,0x4(%rax) > 200 prev->list.next = page->list.next; > 201 ffff82c4801153c6:<++8b 03 <++mov (%rbx),%eax > &nbs p; > 202 ffff82c4801153c8:<++89 02 <++mov %eax,(%rdx) > 203 ffff82c4801153ca:<++49 89 dd <++mov %rbx,%r13 > 204 ffff82c4801153cd:<++41 83 c4 01 <++add $0x1,%r12d > 205 ffff82c4801153d1:<++41 83 fc 12 <++cmp ; $0x12,%r12d > 206 ffff82c4801153d5:<++0f 84 e3 00 00 00 <++je ffff82c4801154be > <free_heap_pages+0x37e> > 207 ffff82c4801153db:<++48 bd 00 00 00 00 0a <++mov $0x7d0a00000000,%rbp > 208 ffff82c4801153e2:<++7d 00 00 > 209 ffff82c4801153e5:<++44 89 e1 <++mov %r12d,%ecx > 210 ffff82c4801153e8:<++be 01 00 00 00 <++mov $0x1,%esi > > > ------------------------------------------------------------------------------ > --------------------- > blktap_sysfs_create: adding attributes for dev ffff880239496c00 > (XEN) ----[ Xen-4.0.0 x86_64 debug=n Not tainted ]---- > (XEN) CPU: 2 > (XEN) RIP: e008:[<ffff82c4801153c3>] free_heap_pages+0x283/0x4a0 > (XEN) RFLAGS: 0000000000010282 CONTEXT: hypervisor > (XEN) rax: ffff8315ffffffe0 rbx: ffff82f6093b0040 rcx: ffff83063fc01a20 > (XEN) rdx: ffff8315ffffffe0 rsi: 00000000ffffffff rdi: 000000000049d802 > (XEN) rbp: 00007d0a00000000 rsp: ffff83023ff37cb8 r8: 0000000000000000 > (XEN) r9: ffffffffffffffff r10: ffff83060a3c0018 r11: 0000000000000282 > (XEN) r12: 0000000000000000 r13: ffff82f6093b0060 r14: 00000000000001a2 > (XEN) r15: 0000000000000001 cr0: 000000008005003b cr4: 00000000000026f0 > (XEN) cr3: 000000008da54000 cr2: ffff83 15ffffffe4 > (XEN) ds: 0000 es: 0000 fs: 0063 gs: 0000 ss: e010 cs: e008 > (XEN) Xen stack trace from rsp=ffff83023ff37cb8: > (XEN) ffff82f6093b7f80 00000000ffffffe0 00000000000001a2 ffff83060a3c0000 > (XEN) 0000000000000000 0000000000000001 ffff82f6093b0060 0000000000000000 > (XEN) ffff82f6093b0080 ffff82c480115732 00000001093b7cc0 ffff82f6093b0060 > (XEN) ffff83060a3c0018 0000000000000000 ffff83060a3c0000 ffff83060a3c0fa8 > (XEN) 0000000000000000 ffff82c48014aaa6 ffff83060a3c0fa8 ffff83060a3c0fa8 > (XEN) ffff83060a3c0014 4000000000000000 ffff83023ff37f28 ffff83060a3c0018 > (XEN) 0000000000000000 ffff83060a3c0000 0000000000305000 0000000000000009 > (XEN) 0000000000000009 ffff82c48014b2fd 00ffffffffffffff ffff83060a3c0000 > (XEN) 0000000000000000 ffff83023ff37e28 0000000000305000 ffff82c480105fe0 > (XEN) ffff82c480255240 fffffffffffffff3 0000000002599000 ffff82c4801043ce > (XEN) ffff82c4801447da 0000000000000080 ffff83023ff37f28 0000000000000096 > (XEN) ffff83023ff37f28 00000000000000fc 0000000600000002 00000000023c0031 > (XEN) 0000000000000001 00000039890a8e2a 0000003000000018 000000004523af30 > (XEN) 000000004523ae70 0000000000000000 00007fc608ea8a70 000000398903c8a4 > (XEN) 000000004523af44 0000000000000000 000000004523b158 0000000000000000 > (XEN) 0000007f024f6d20 00007fc60a094750 000000000255ff40 00007fc607be5ea8 > (XEN) fffffffffffffff5 0000000000000246 00000039880cc557 0000000000000100 > (XEN) 00000039880cc557 0000000000000033 0000000000000246 ffff8300bf562000 > (XEN) ffff8801db8d3e78 000000004523aec0 0000000000305000 000000 0000000009 > (XEN) 0000000000000009 ffff82c4801e3169 0000000000000009 0000000000000009 > (XEN) Xen call trace: > (XEN) [<ffff82c4801153c3>] free_heap_pages+0x283/0x4a0 > (XEN) [<ffff82c480115732>] free_domheap_pages+0x152/0x380 > (XEN) [<ffff82c48014aaa6>] relinquish_memory+0x186/0x530 > (XEN) [<ffff82c48014b2fd>] domain_relinquish_resources+0x1ad/0x280 > (XEN) [<ffff82c480105fe0>] domain_kill+0x80/0xf0 > (XEN) [<ffff82c4801043ce>] do_domctl+0x1be/0x1000 > (XEN) [<ffff82c4801447da>] __find_next_bit+0x6a/0x70 > (XEN) [<ffff82c4801e3169>] syscall_enter+0xa9/0xae > (XEN) > (XEN) Pagetable walk from ffff8315ffffffe4: > (XEN) L4[0x106] = 00000000bf569027 5555555555555555 > (XEN) L3[0x057] = 0000000000000000 ffffffffffffffff > (XE N) > (XEN) **************************************** > (XEN) Panic on CPU 2: > (XEN) FATAL PAGE FAULT > (XEN) [error_code=0002] > (XEN) Faulting linear address: ffff8315ffffffe4 > (XEN) **************************************** > (XEN) > (XEN) Manual reset required ('noreboot' specified) > > ------------------------------------------------------------------------------ > --------------------- >> Date: Mon, 30 Aug 2010 14:16:09 +0100 >> Subject: Re: [Xen-devel] Xen-unstable panic: FATAL PAGE FAULT >> From: keir.fraser@xxxxxxxxxxxxx >> To: tinnycloud@xxxxxxxxxxx; xen-devel@xxxxxxxxxxxxxxxxxxx >> >> On 30/08/2010 14:03, "MaoXiaoyun" <tinnycloud@xxxxxxxxxxx> wrote: >> >>> Appreciate for the quick response. >>> >>> Actually I have done some decode on the backtrace last Friday. >>> According the RIP ffff82c4801153c3, I cut the "objdump -dS xen-syms" >>> (please see below). It looks like the bug happened on the domain page list >> >> ffff82c4801153c3 isn't the start of an instruction in your below >> disassembly. Hence you didn't disassemble exactly the build of Xen which >> crashed. It needs to be exactly the same image. >> >> -- keir >> >> & gt; travels, which is beyond my understanding. Since in my understanding, >>> those domain pages come from kernel memory zone, they are always >>> reside in the physical memory, and the address shouldn't have the chance >>> to be changed, right? >>> If so, what is the relationship between all those panic and free_heap_pages? >>> >>> Several servers (at least 3) experienced the same panic on the same test. >>> Those servers have the identical hardware, kernel and xen configuration. >>> Right now, on one server, memtest is running, shall be finished in a few >>> hours. >>> (24G memory) >>> >>> ---------------------------------------------------------------------------- >>> -- >>> ------ >>> 169 static inline void >>> 170 page_list_del(struct page_info *page, struct page_list_head *head) >>> 171 { >>> 172 struct page_info *next = p dx_to_page(page->list.next); >>> 173 struct page_info *prev = pdx_to_page(page->list.prev); >>> 174 ffff82c4801153b8:<++8b 73 04 <++mov 0x4(%rbx),%esi >>> 175 ffff82c4801153bb:<++49 8d 0c 06 <++lea (%r14,%rax,1),%rcx >>> 176 ffff82c4801153bf:<++48 8d 05 fa 10 26 00 <++lea 2494714(%rip),%rax >>> # ffff82c4803764c0 <_heap> >>> 177 ffff82c4801153c6:<++48 c1 e1 04 <++shl $0x4,%rcx >>> 178 ffff82c4801153ca:<++4a 03 0c f8 <++add (%rax,%r15,8),%rcx >>> 179 } >>> 180 static inline void >>> 181 page_list_del(struct page_info *page, struct page_list_head *head) >>> 182 { >>> 183 struct page_info *next = pdx_to_page(page->list.next); >>> 184 ffff82c4801153ce:<++8b 03 <++mov (%rbx),%eax >>> 185 ffff82c4801153d0:<++48 c1 e0 05 <++shl $0x5,%rax >>> 186 ffff82c4801153d4:<++48 29 e8 <++sub %rbp,%r ax >>> 187 ffff82c4801153d7:<++48 3b 19 <++cmp (%rcx),%rbx >>> 188 ffff82c4801153da:<++0f 84 95 01 00 00 <++je ffff82c480115575 >>> <free_heap_pages+0x405> >>> 189 struct page_info *prev = pdx_to_page(page->list.prev); >>> 190 ffff82c4801153e0:<++89 f2 <++mov %esi,%edx >>> 191 ffff82c4801153e2:<++48 c1 e2 05 <++shl $0x5,%rdx >>> 192 ffff82c4801153e6:<++48 29 ea <++sub %rbp,%rdx >>> 193 ffff82c4801153e9:<++48 3b 59 08 <++cmp 0x8(%rcx),%rbx >>> 194 ffff82c4801153ed:<++0f 84 bd 01 00 00 <++je ffff82c4801155b0 >>> <free_heap_pages+0x440> >>> 195 >>> 196 if ( !__page_list_del_head(page, head, next, prev) ) >>> 197 { >>> 198 >>> ---------------------------------------------------------------------------- >>> -- >>> ------ >>> >>>> Date: Mon, 30 Aug 2010 10:02:05 +01 00 >>>> Subject: Re: [Xen-devel] Xen-unstable panic: FATAL PAGE FAULT >>>> From: keir.fraser@xxxxxxxxxxxxx >>>> To: tinnycloud@xxxxxxxxxxx; xen-devel@xxxxxxxxxxxxxxxxxxx >>>> >>>> On 30/08/2010 09:47, "MaoXiaoyun" <tinnycloud@xxxxxxxxxxx> wrote: >>>> >>>>> 3) Every panic pointer to the same address: ffff8315ffffffe4, which is >>>>> not a valid page address. >>>>> I printted pages of the domain in assign_pages, which all looks like >>>>> ffff82f60bd64000, at least >>>>> ffff82f60 is the same. >>>> >>>> Yes, well you may not be crashing on a supposed page address. Certainly the >>>> page pointer that relinquish_memory() is working on, and passed to >>>> put_page->free_domheap_pages is valid enough to not cause any of those >>>> functions to crash when dereferenci ng it. At the moment you really have no >>>> idea what is causing free_heap_pages() to crash. >>>> >>>>> A bit of lost direction to go further. Thanks. >>>> >>>> You need to find out which line of code in free_heap_pages() is crashing, >>>> and what variable it is trying to dereference when it crashes. You have a >>>> nice backtrace with an EIP value, so you can 'objdump -d xen-syms' and >>>> search for the EIP in the disassembly. If you have a debug build of Xen you >>>> can even do 'objdump -S xen-syms' and have the disassembly annotated with >>>> corresponding source lines. >>>> >>>> Have you seen this on more than one physical machine? If not, have you run >>>> memtest on the offending machine? >>>> >>>> -- Keir >>>> >>>> >>> >> >> > _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |