[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Xen-unstable panic: FATAL PAGE FAULT
More interesting would be to turn the BUG_ON stamements in my first patch into if() statements and print out that kind of info before panic()ing. It would tell us which BUG_ON() fired, the page addresses (and maybe MFNs) and order, mask, node, and zone info. -- Keir On 01/09/2010 11:25, "Keir Fraser" <keir.fraser@xxxxxxxxxxxxx> wrote: > That doesn't imply anything. It is perfectly valid for a page's prev or next > index to be PAGE_LIST_NULL, if that page is not in a list, or if it is at > the head and/or tail of a list. > > -- Keir > > On 01/09/2010 11:21, "MaoXiaoyun" <tinnycloud@xxxxxxxxxxx> wrote: > >> Thanks Keir. >> >> I myself did below test. in page_alloc.c. >> check_page will panic on all pages which the 6th character in its adddress is >> '3', i used to indicate which line paniced. >> >> Below output indicates the panic comes from line 558, and the page address is >> ffff82f600002040, while its next page >> is ffff8315ffffffe0, compare to the panic address in previous >> panic(ffff8315ffffffe4), which is very similar. >> >> I think this should imply something. >> >> --------------------------------------- >> (XEN) -----------18 >> (XEN) System RAM: 24542MB (25131224kB) >> (XEN) SRAT: No PXM for e820 range: 0000000000000000 - 000000000009a7ff >> (XEN) SRAT: SRAT not used. >> (XEN) ----------------pgb ffff82f600002040 pg ffff8315ffffffe0, mask 1, order >> 0, 0 >> (XEN) >> (XEN) **************************************** >> (XEN) Panic on CPU 0: >> (XEN) xmao invalid page address assigned >> (XEN) **************************************** >> (XEN) >> >> ---------------------------------------------------- >> 485 static int check_page(struct page_info* pgb, struct page_info* pg, >> unsigned long mask, unsigned int order, int i){ >> 486 >> 487 if((unsigned long)pg & 0x0000020000000000 && >> 488 (unsigned long)pg & 0x0000010000000000 >> 489 ){ >> 490 printk("----------------pgb %p pg %p, mask %lx, order >> %d, %d\n", pgb, pg, mask, order, i); >> 491 panic("xmao invalid page address assigned \n"); >> 492 } >> 493 return 0; >> 494 } >> >> 549 if ( (page_to_mfn(pg) & mask) ) >> 550 { >> 551 /* Merge with predecessor block? */ >> 552 if ( !mfn_valid(page_to_mfn(pg-mask)) || >> 553 !page_state_is(pg-mask, free) || >> 554 (PFN_ORDER(pg-mask) != order) ) >> 555 break; >> 556 pg -= mask; >> 557 >> 558 check_page(pg, pdx_to_page(pg->list.next), mask, order, 0); >> 559 check_page(pg, pdx_to_page(pg->list.prev), mask, order, 1); >> 560 >> 561 page_list_del(pg, &heap(node, zone, order)); >> 562 } >> 563 else >> 564 { >> 565 /* Merge with successor block? */ >> 566 if ( !mfn_valid(page_to_mfn(pg+mask)) || >> 567 !page_state_is(pg+mask, free) || >> 568 (PFN_ORDER(pg+mask) != order) ) >> 569 break; >> 570 >> 571 pgt = pg + mask; >> 572 check_page(pg, pdx_to_page(pgt->list.next), mask, order, 2); >> 573 check_page(pg, pdx_to_page(pgt->list.prev), mask, order, 3); >> 574 >> >>> Date: Wed, 1 Sep 2010 10:58:54 +0100 >>> Subject: Re: [Xen-devel] Xen-unstable panic: FATAL PAGE FAULT >>> From: keir.fraser@xxxxxxxxxxxxx >>> To: tinnycloud@xxxxxxxxxxx; jbeulich@xxxxxxxxxx >>> CC: xen-devel@xxxxxxxxxxxxxxxxxxx >>> >>> Hm, well, it is a bit weird. The check in init_heap_pages() ought to prevent >>> merging across node boundaries. Nonetheless the code is simpler and more >>> obvious if we put a further merging constraint in free_heap_pages() instead. >>> It's also correcter, since I'm not sure that the >>> phys_to_nid(page_to_maddr(pg-1)) in init_heap_pages() won't possibly BUG out >>> if pg-1 is not a RAM page and is not in a known NUMA node range. >>> >>> Please give the attached patch a spin. (You should revert the previous >>> patch, of course). >>> >>> Thanks, >>> Keir >>> >>> On 01/09/2010 10:23, "MaoXiaoyun" <tinnycloud@xxxxxxxxxxx> wrote: >>> >>>> Well. It did crash on every startup. >>>> >>>> below is what I got. >>>> --------------------------------------------------- >>>> root (hd0,0) >>>> Filesystem type is ext2fs, partition type 0x83 >>>> kernel /xen-4.0.0.gz msi=1 iommu=off x2apic=off hap=0 dom0_mem=10240M >>>> dom0_max_ >>>> vcpus=4 dom0_vcpus_pin console=com1,vga com1=115200,8n1 conswitch=ax >>>> noreboot >>>> [Multiboot-elf, <0x100000:0x152000:0x148000>, shtab=0x39a078, >>>> entry=0x100000 >>>> ] >>>> module /vmlinuz-2.6.31.13-pvops-patch ro root=LABEL=/ hda=noprobe >>>> console=hvc0 >>>> [Multiboot-module @ 0x39b000, 0x3214d0 bytes] >>>> >>>> >>>> ? __ __ _ _ >>>> ___ ___ >>>> \ \/ /___ _ __ | || | / _ \ / _ \ * >>>> \ // _ \ '_ \ | || |_| | | | | | | * >>>> / \ __/ | | | |__ _| |_| | |_| | * * >>>> /_/\_\___|_| |_| |_|(_)___(_)___/ ************************************** >>>> hich entry is highlighted. >>>> (XEN) Xen version 4.0.0 (root@xxxxxxxxxxxxxxxxx) (gcc version 4.1.2 >>>> 20080704 >>>> (Red Hat 4.1.2-46)) Wed Sep 1 17:13:35 CST 2010 >>>> (XEN) Latest ChangeSet: unavailableto modify the kernel arguments >>>> (XEN) Command line: msi=1 iommu=off x2apic=off hap=0 dom0_mem=10240M >>>> dom0_max_vcpus=4 dom0_vcpus_pin console=com1,vga com1=115200,8n1 >>>> conswitch=ax >>>> noreboot >>>> (XEN) Video information: >>>> (XEN) VGA is text mode 80x25, font 8x16automatically in 3 seconds. >>>> (XEN) VBE/DDC methods: none; EDID transfer time: 0 seconds >>>> (XEN) EDID info not retrieved because no DDC retrieval method detected >>>> (XEN) Disc information: >>>> (XEN) Found 6 MBR signatures >>>> (XEN) Found 6 EDD information structures >>>> (XEN) Xen-e820 RAM map: >>>> (XEN) 0000000000000000 - 000000000009a800 (usable) >>>> (XEN) 000000000009a800 - 00000000000a0000 (reserved) >>>> (XEN) 00000000000e4bb0 - 0000000000100000 (reserved) >>>> (XEN) 0000000000100000 - 00000000bf790000 (usable) >>>> (XEN) 00000000bf790000 - 00000000bf79e000 (ACPI data) >>>> (XEN) 00000000bf79e000 - 00000000bf7d0000 (ACPI NVS) >>>> (XEN) 00000000bf7d0000 - 00000000bf7e0000 (reserved) >>>> (XEN) 00000000bf7ec000 - 00000000c0000000 (reserved) >>>> (XEN) 00000000e0000000 - 00000000f0000000 (reserved) >>>> (XEN) 00000000fee00000 - 00000000fee01000 (reserved) >>>> (XEN) 00000000fff00000 - 0000000100000000 (reserved) >>>> (XEN) 0000000100000000 - 0000000640000000 (usable) >>>> (XEN) --------------849 >>>> (XEN) --------------849 >>>> (XEN) --------------849 >>>> (XEN) ACPI: RSDP 000F9DD0, 0024 (r2 ACPIAM) >>>> (XEN) ACPI: XSDT BF790100, 005C (r1 112309 XSDT1113 20091123 MSFT 97) >>>> (XEN) ACPI: FACP BF790290, 00F4 (r4 112309 FACP1113 20091123 MSFT 97) >>>> (XEN) ACPI: DSDT BF7904B0, 4D6A (r2 CTSAV CTSAV122 122 INTL 20051117) >>>> (XEN) ACPI: FACS BF79E000, 0040 >>>> (XEN) ACPI: APIC BF790390, 00D8 (r2 112309 APIC1113 20091123 MSFT 97) >>>> (XEN) ACPI: MCFG BF790470, 003C (r1 112309 OEMMCFG 20091123 MSFT 97) >>>> (XEN) ACPI: OEMB BF79E040, 007A (r1 112309 OEMB1113 20091123 MSFT 97) >>>> (XEN) ACPI: SRAT BF79A4B0, 01D0 (r1 112309 OEMSRAT 1 INTL 1) >>>> (XEN) ACPI: HPET BF79A680, 0038 (r1 112309 OEMHPET 20091123 MSFT 97) >>>> (XEN) ACPI: SSDT BF7A1A00, 0363 (r1 DpgPmm CpuPm 12 INTL 20051117) >>>> (XEN) --------------847 >>>> (XEN) ---------srat enter >>>> (XEN) ---------prepare enter into pfn >>>> (XEN) -------in pfn >>>> (XEN) -------hole shift returned >>>> (XEN) --------------849 >>>> (XEN) System RAM: 24542MB (25131224kB) >>>> (XEN) Unknown interrupt (cr2=0000000000000000) >>>> (XEN) 00000000000000ab 0000000000000000 ffff82f600004020 >>>> 00007d0a00000000 ffff82f600004000 0000000000000020 0000000000201000 >>>> 0000000000000000 ffffffffffffffff 0000000000000000 0000000000000008 >>>> 0000000000000000 00000000000001ff 00000000000001ff 0000000000000000 >>>> ffff82c480115787 000000000000e008 0000000000010002 ffff82c48035fd18 >>>> 0000000000000000 ffff82c48011536a 0000000000000000 0000000000000000 >>>> 0000000000000163 0000000900000000 00000000000000ab 0000000000000201 >>>> 0000000000000000 0000000000000100 ffff82f600004020 0000000000000eff >>>> 0000000000000000 ffff82c480115e60 0000000000000000 ffff82f600002020 >>>> 0000000000001000 0000000000000004 0000000000000080 0000000000000001 >>>> ffff82c48020be8d ffff830000100000 0000000000000008 0000000000000000 >>>> 0000000000000000 ffffffffffffffff 0000000000000101 ffff82c48022d8fc >>>> 0000000000540000 00000000005fde36 0000000000540000 0000000000100000 >>>> 0000000100000000 0000000000000010 ffff82c48024deb4 ffff82c4802404f7 >>>> 0000000000000000 0000000000000000 0000000000000000 0000000000000000 >>>> 0000000000000000 ffff8300bf568ff8 ffff8300bf569ff8 000000000022a630 >>>> 000000000022a695 0000000000087f00 0000000000000000 ffff830000087fc0 >>>> 00000000005fde36 000000000087b6d0 0000000000d44000 0000000001000000 >>>> 0000000000000000 ffffffffffffffff ffff830000087f00 0000100000000000 >>>> 0000000800000000 000000010000006e 0000000000000003 00000000000002f8 >>>> 0000000000000000 0000000000000000 0000000000067ebc 0000000000000000 >>>> 0000000000000000 0000000000000000 0000000000000000 ffff82c4801000b5 >>>> 0000000000000000 0000000000000000 0000000000000000 0000000000000000 >>>> 0000000000000000 0000000000000000 0000000000000000 0000000000000000 >>>> 0000000000000000 0000000000000000 0000000000000000 0000000000000000 >>>> 0000000000000000 0000000000000000 0000000000000000 0000000000000000 >>>> 0000000000000000 0000000000000000 0000000000000000 0000000000000000 >>>> 0000000000000000 0000000000000000 0000000000000000 0000000000000000 >>>> 0000000000000000 0000000000000000 00000000fffff000 >>>> >>>>> Date: Wed, 1 Sep 2010 09:49:18 +0100 >>>>> Subject: Re: [Xen-devel] Xen-unstable panic: FATAL PAGE FAULT >>>>> From: keir.fraser@xxxxxxxxxxxxx >>>>> To: JBeulich@xxxxxxxxxx >>>>> CC: tinnycloud@xxxxxxxxxxx; xen-devel@xxxxxxxxxxxxxxxxxxx >>>>> >>>>> On 01/09/2010 09:02, "Jan Beulich" <JBeulich@xxxxxxxxxx> wrote: >>>>> >>>>>>> Well I agree with your logic anyway. So I don't see that this can be the >>>>>>> cause of MaoXiaoyun's bug. At least not directly. But then I'm stumped >>>>>>> as >>>>>>> to >>>>>>> why the page arithmetic and checks in free_heap_pages are (apparently) >>>>>>> resulting in a page pointer way outside the frame-table region and >>>>>>> actually >>>>>>> in the directmap region. >>>>>> >>>>>> There must be some unchecked use of PAGE_LIST_NULL, i.e. >>>>>> running off a list end without taking notice (0xffff8315ffffffe4 >>>>>> exactly corresponds with that). >>>>> >>>>> Okay, my next guess then is that we are deleting a chunk from the wrong >>>>> list >>>>> head. I don't see any check that the adjacent chunks we are considering to >>>>> merge are from the same node and zone. I suppose the zone logic does just >>>>> work as we're dealing with 2**x aligned and sized regions. But, shouldn't >>>>> the merging logic in free_heap_pages be checking that the merging >>>>> candidate >>>>> is from the same NUMA node? I see I have an ASSERTion later in the same >>>>> function, but it's too weak and wishful I suspect. >>>>> >>>>> MaoXiaoyun: can you please test with the attached patch? If I'm right, you >>>>> will crash on one of the BUG_ON checks that I added, rather than crashing >>>>> on >>>>> a pointer dereference. You may even crash during boot. Anyhow, what is >>>>> interesting is whether this patch always makes you crash on BUG_ON before >>>>> you would normally crash on pointer dereference. If so this is trivial to >>>>> fix. >>>>> >>>>> Thanks, >>>>> Keir >>>>> >>>> >>> >> > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@xxxxxxxxxxxxxxxxxxx > http://lists.xensource.com/xen-devel _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |