[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] PVops domain 0 crash on NUMA system only Node==1 present (Was: Re: Bug#603632: linux-image-2.6.32-5-xen-amd64: Linux kernel 2.6.32/xen/amd64 booting fine on bare metal, but not as dom0 with Xen 4.0.1 (Dell R410))



On Tue, 2010-11-23 at 10:24 -0800, Jeremy Fitzhardinge wrote:
> On 11/23/2010 03:51 AM, Ian Campbell wrote:
> > I'm not sure but looking at the complete bootlog it looks as if the
> > system may only have node==1 i.e. no 0 node which could plausibly lead
> > to this sort of issue:
> >         [    0.000000] Bootmem setup node 1 
> > 0000000000000000-0000000040000000
> >         [    0.000000]   NODE_DATA [0000000000008000 - 000000000000ffff]
> >         [    0.000000]   bootmap [0000000000010000 -  0000000000017fff] 
> > pages 8
> >         [    0.000000] (8 early reservations) ==> bootmem [0000000000 - 
> > 0040000000]
> >         [    0.000000]   #0 [0000000000 - 0000001000]   BIOS data page ==> 
> > [0000000000 - 0000001000]
> >         [    0.000000]   #1 [0003446000 - 0003465000]   XEN PAGETABLES ==> 
> > [0003446000 - 0003465000]
> >         [    0.000000]   #2 [0000006000 - 0000008000]       TRAMPOLINE ==> 
> > [0000006000 - 0000008000]
> >         [    0.000000]   #3 [0001000000 - 0001694994]    TEXT DATA BSS ==> 
> > [0001000000 - 0001694994]
> >         [    0.000000]   #4 [00016b5000 - 0003244e00]          RAMDISK ==> 
> > [00016b5000 - 0003244e00]
> >         [    0.000000]   #5 [0003245000 - 0003446000]   XEN START INFO ==> 
> > [0003245000 - 0003446000]
> >         [    0.000000]   #6 [0001695000 - 000169532d]              BRK ==> 
> > [0001695000 - 000169532d]
> >         [    0.000000]   #7 [0000100000 - 00002e0000]          PGTABLE ==> 
> > [0000100000 - 00002e0000]
> >         [    0.000000] found SMP MP-table at [ffff8800000fe710] fe710
> >         [    0.000000] Zone PFN ranges:
> >         [    0.000000]   DMA      0x00000000 -> 0x00001000
> >         [    0.000000]   DMA32    0x00001000 -> 0x00100000
> >         [    0.000000]   Normal   0x00100000 -> 0x00100000
> >         [    0.000000] Movable zone start PFN for each node
> >         [    0.000000] early_node_map[2] active PFN ranges
> >         [    0.000000]     1: 0x00000000 -> 0x000000a0
> >         [    0.000000]     1: 0x00000100 -> 0x00040000
> >         [    0.000000] On node 1 totalpages: 262048
> >         [    0.000000]   DMA zone: 56 pages used for memmap
> >         [    0.000000]   DMA zone: 483 pages reserved
> >         [    0.000000]   DMA zone: 3461 pages, LIFO batch:0
> >         [    0.000000]   DMA32 zone: 3528 pages used for memmap
> >         [    0.000000]   DMA32 zone: 254520 pages, LIFO batch:31
> >
> > Perhaps we should be passing numa_node_id() (e.g. current node) instead
> > of node 0? There doesn't seem to be another obvious alternative to
> > passing in an explicit node number to this callchain (some places cope
> > with -1 but not this path AFAICT).
> 
> Does booting native get the same configuration?

  Booting native with the same Xen-enabled kernel gives:

[    0.000000] Bootmem setup node 0 0000000130000000-0000000230000000
[    0.000000]   NODE_DATA [0000000130000000 - 0000000130007fff]
[    0.000000]   bootmap [0000000130008000 -  0000000130027fff] pages 20
[    0.000000] (8 early reservations) ==> bootmem [0130000000 -
0230000000]
[    0.000000]   #0 [0000000000 - 0000001000]   BIOS data page
[    0.000000]   #1 [0000006000 - 0000008000]       TRAMPOLINE
[    0.000000]   #2 [0001000000 - 0001694994]    TEXT DATA BSS
[    0.000000]   #3 [0037656000 - 0037fefb18]          RAMDISK
[    0.000000]   #4 [000009ec00 - 0000100000]    BIOS reserved
[    0.000000]   #5 [0001695000 - 000169532d]              BRK
[    0.000000]   #6 [0000008000 - 000000c000]          PGTABLE
[    0.000000]   #7 [000000c000 - 0000011000]          PGTABLE
[    0.000000] Bootmem setup node 1 0000000000000000-0000000130000000
[    0.000000]   NODE_DATA [0000000000011000 - 0000000000018fff]
[    0.000000]   bootmap [0000000000019000 -  000000000003efff] pages 26
[    0.000000] (8 early reservations) ==> bootmem [0000000000 -
0130000000]
[    0.000000]   #0 [0000000000 - 0000001000]   BIOS data page ==>
[0000000000 - 0000001000]
[    0.000000]   #1 [0000006000 - 0000008000]       TRAMPOLINE ==>
[0000006000 - 0000008000]
[    0.000000]   #2 [0001000000 - 0001694994]    TEXT DATA BSS ==>
[0001000000 - 0001694994]
[    0.000000]   #3 [0037656000 - 0037fefb18]          RAMDISK ==>
[0037656000 - 0037fefb18]
[    0.000000]   #4 [000009ec00 - 0000100000]    BIOS reserved ==>
[000009ec00 - 0000100000]
[    0.000000]   #5 [0001695000 - 000169532d]              BRK ==>
[0001695000 - 000169532d]
[    0.000000]   #6 [0000008000 - 000000c000]          PGTABLE ==>
[0000008000 - 000000c000]
[    0.000000]   #7 [000000c000 - 0000011000]          PGTABLE ==>
[000000c000 - 0000011000]
[    0.000000] found SMP MP-table at [ffff8800000fe710] fe710
[    0.000000] [ffffea0004280000-ffffea00043fffff] potential offnode
page_structs
[    0.000000]  [ffffea0000000000-ffffea00043fffff] PMD ->
[ffff880001800000-ffff8800051fffff] on node 1
[    0.000000]  [ffffea0004400000-ffffea0007bfffff] PMD ->
[ffff880130200000-ffff8801339fffff] on node 0
[    0.000000] Zone PFN ranges:
[    0.000000]   DMA      0x00000000 -> 0x00001000
[    0.000000]   DMA32    0x00001000 -> 0x00100000
[    0.000000]   Normal   0x00100000 -> 0x00230000
[    0.000000] Movable zone start PFN for each node
[    0.000000] early_node_map[4] active PFN ranges
[    0.000000]     1: 0x00000000 -> 0x000000a0
[    0.000000]     1: 0x00000100 -> 0x000cf679
[    0.000000]     1: 0x00100000 -> 0x00130000
[    0.000000]     0: 0x00130000 -> 0x00230000
[    0.000000] On node 0 totalpages: 1048576
[    0.000000]   Normal zone: 14336 pages used for memmap
[    0.000000]   Normal zone: 1034240 pages, LIFO batch:31
[    0.000000] On node 1 totalpages: 1046041
[    0.000000]   DMA zone: 56 pages used for memmap
[    0.000000]   DMA zone: 109 pages reserved
[    0.000000]   DMA zone: 3835 pages, LIFO batch:0
[    0.000000]   DMA32 zone: 14280 pages used for memmap
[    0.000000]   DMA32 zone: 831153 pages, LIFO batch:31
[    0.000000]   Normal zone: 2688 pages used for memmap
[    0.000000]   Normal zone: 193920 pages, LIFO batch:31


> > It's also not obvious if dom0 should be seeing the tables which describe
> > the hosts nodes anyway or if we should be clobbering something. Given
> > that dom0 sees a pseudo-physical address map I'm not convinced seeing
> > the real SRAT is in any way beneficial. Perhaps we should simply be
> > clobbering NUMAness until actual PV understanding of NUMA is ready?
> 
> Yes, the host SRAT is meaningless in the domain and we really should
> ignore it.  I'm not sure what happens if you boot on a really NUMA system.
> 
> > One thing I notice when googling R410 issues is that they apparently
> > have a "Cores per CPU" BIOS option which might be worth playing with,
> > since configuring a reduced number of cores might remove node 0 but not
> > node 1 (odd but not invalid?). Presumably it is also worth making sure
> > you have the latest BIOS etc.
> 
> Also, what's the DIMM configuration?  Are the slots fully populated?

  8 slots, 4 populated; slots #0, #1, #4 and #5 populated with 2GiB
dimms (according to lshw, setup by Dell).

  I switched off hyperthreading in the BIOS settings (default is 'on'),
I had issues with Xen 3.2 on this topic (related to floating vcpus,
which I had to pin to fix random crashes). Also I don't think HT is
significant for my usage. I'm used to see strange bugs as soon as I
tweak Dell BIOSes, so I thought I'd mention that.



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.