[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] 3.2 PVOPs Intel Crash
On 18/01/12 16:51, Konrad Rzeszutek Wilk wrote: > On Wed, Jan 18, 2012 at 11:41:22AM -0500, Tom Goetz wrote: > > CC-ing xen-devel and David. > >> We have dom0_mem=672MB for Xen and mem=672MB for linux. > > Ok, if you don't have the mem=X and have the "('x86: use 'dom0_mem' to limit > the number of pages for dom0') (c/s 23790) in your hypervisor what happens? > > And also have 'dom0_mem=max:672MB' do you get the same issue? The kernel's mem option should be marking the extra memory as unusable instead of just removing it from the E820. I'll take a look at this -- it should be pretty straight-forward. I would recommend what Konrad says above. This ought to work. David >> [ 0.000000] e820 remove range: 000000002a000000 - ffffffffffffffff (usable) >> >> appears to come from >> >> static int __init parse_memopt(char *p) >> { >> u64 mem_size; >> >> if (!p) >> return -EINVAL; >> >> if (!strcmp(p, "nopentium")) { >> #ifdef CONFIG_X86_32 >> setup_clear_cpu_cap(X86_FEATURE_PSE); >> return 0; >> #else >> printk(KERN_WARNING "mem=nopentium ignored! (only supported >> on x86_32)\n"); >> return -EINVAL; >> #endif >> } >> >> userdef = 1; >> mem_size = memparse(p, &p); >> /* don't remove all of memory when handling "mem={invalid}" param */ >> if (mem_size == 0) >> return -EINVAL; >> e820_remove_range(mem_size, ULLONG_MAX - mem_size, E820_RAM, 1); >> <----------------------------- >> >> return 0; >> } >> early_param("mem", parse_memopt); >> >> but we have the same mem opt for 2.6.38 and 3.2 and the mem code still has >> the e820_remove_range in 3.2. Dom0 is showing the right amount of mem when >> booted on other machines, so I don't think the mem= option is failing. > > The 'mem=X' argument I remember being a work-around. The original bug had > been fixed in both > hypervisor and in the kernel. > > >> >> I'm taking a break for lunch now and I'll did in further on the mem= option >> after. >> >> On Jan 18, 2012, at 11:34 AM, Konrad Rzeszutek Wilk wrote: >> >>> On Wed, Jan 18, 2012 at 11:02:48AM -0500, Tom Goetz wrote: >>>> The E820s are different: >>>> >>>> Xen E820: >>>> >>>> (XEN) Xen-e820 RAM map: >>>> (XEN) 0000000000000000 - 000000000009f000 (usable) >>>> (XEN) 000000000009f000 - 00000000000a0000 (reserved) >>>> (XEN) 0000000000100000 - 00000000bf65b800 (usable) >>>> (XEN) 00000000bf65b800 - 00000000c0000000 (reserved) >>>> (XEN) 00000000f8000000 - 00000000fc000000 (reserved) >>>> (XEN) 00000000fec00000 - 00000000fec10000 (reserved) >>>> (XEN) 00000000fed18000 - 00000000fed1c000 (reserved) >>>> (XEN) 00000000fed20000 - 00000000fed90000 (reserved) >>>> (XEN) 00000000feda0000 - 00000000feda6000 (reserved) >>>> (XEN) 00000000fee00000 - 00000000fee10000 (reserved) >>>> (XEN) 00000000ffe00000 - 0000000100000000 (reserved) >>>> >>>> 2.6.38 E820: >>>> >>>> [ 0.000000] BIOS-provided physical RAM map: >>>> [ 0.000000] Xen: 0000000000000000 - 000000000009f000 (usable) >>>> [ 0.000000] Xen: 000000000009f000 - 0000000000100000 (reserved) >>>> [ 0.000000] Xen: 0000000000100000 - 000000002a000000 (usable) >>>> [ 0.000000] Xen: 000000002a000000 - 00000000bf65b000 (unusable) >>> >>> Good. That is correct. >>> >>>> [ 0.000000] Xen: 00000000bf65b800 - 00000000c0000000 (reserved) >>>> [ 0.000000] Xen: 00000000f8000000 - 00000000fc000000 (reserved) >>>> [ 0.000000] Xen: 00000000fec00000 - 00000000fec10000 (reserved) >>>> [ 0.000000] Xen: 00000000fed18000 - 00000000fed1c000 (reserved) >>>> [ 0.000000] Xen: 00000000fed20000 - 00000000fed90000 (reserved) >>>> [ 0.000000] Xen: 00000000feda0000 - 00000000feda6000 (reserved) >>>> [ 0.000000] Xen: 00000000fee00000 - 00000000fee10000 (reserved) >>>> [ 0.000000] Xen: 00000000ffe00000 - 0000000100000000 (reserved) >>>> [ 0.000000] Xen: 0000000100000000 - 000000019565b000 (usable) >>>> [ 0.000000] e820 remove range: 000000002a000000 - ffffffffffffffff (usable) >>>> [ 0.000000] NX (Execute Disable) protection: active >>>> [ 0.000000] user-defined physical RAM map: >>>> [ 0.000000] user: 0000000000000000 - 000000000009f000 (usable) - 1 >>>> [ 0.000000] user: 000000000009f000 - 0000000000100000 (reserved) - 2 >>>> [ 0.000000] user: 0000000000100000 - 000000002a000000 (usable) - 3 >>>> [ 0.000000] user: 000000002a000000 - 00000000bf65b000 (unusable) - 4 >>>> <------------ This isn't in the Xen version either. >>> >>> Yup, that is OK. We want that region to be mapped as 'unusable'. >>> >>> That will make the intel-agp code _not_ use that region (which we >>> should not as that is a RAM region). >>> >>>> [ 0.000000] user: 00000000bf65b800 - 00000000c0000000 (reserved) - 5 >>>> [ 0.000000] user: 00000000f8000000 - 00000000fc000000 (reserved) - 6 >>>> [ 0.000000] user: 00000000fec00000 - 00000000fec10000 (reserved) - 7 >>>> [ 0.000000] user: 00000000fed18000 - 00000000fed1c000 (reserved) - 8 >>>> [ 0.000000] user: 00000000fed20000 - 00000000fed90000 (reserved) - 9 >>>> [ 0.000000] user: 00000000feda0000 - 00000000feda6000 (reserved) - 10 >>>> [ 0.000000] user: 00000000fee00000 - 00000000fee10000 (reserved) - 11 >>>> [ 0.000000] user: 00000000ffe00000 - 0000000100000000 (reserved) - 12 >>>> [ 0.000000] DMI 2.4 present. >>>> [ 0.000000] DMI: Dell Inc. Latitude D830 /0HN341, BIOS A05 11/05/2007 >>>> [ 0.000000] e820 update range: 0000000000000000 - 0000000000010000 >>>> (usable) ==> (reserved) <---- 3.2 is also missing these lines >>>> [ 0.000000] e820 remove range: 00000000000a0000 - 0000000000100000 (usable) >>>> >>>> >>>> 3.2 E820: >>>> >>>> [ 0.000000] Set 264710 page(s) to 1-1 mapping >>>> >>>> [ 0.000000] BIOS-provided physical RAM map: >>>> [ 0.000000] Xen: 0000000000000000 - 000000000009f000 (usable) >>>> [ 0.000000] Xen: 000000000009f000 - 0000000000100000 (reserved) >>>> [ 0.000000] Xen: 0000000000100000 - 00000000bf65b000 (usable) >>> >>> So here, we should have had the >>> >>> 2a000 -> bf65b marked as unsuable. > > On a second thought that is OK too. The 2a00->bf65b will > protect the region from being slurped up by the PCI as "gap" region. >>> >>> You booted the kernel with the same dom0_mem=X argument right? >>> >>>> [ 0.000000] Xen: 00000000bf65b800 - 00000000c0000000 (reserved) >>>> [ 0.000000] Xen: 00000000f8000000 - 00000000fc000000 (reserved) >>>> [ 0.000000] Xen: 00000000fec00000 - 00000000fec10000 (reserved) >>>> [ 0.000000] Xen: 00000000fed18000 - 00000000fed1c000 (reserved) >>>> [ 0.000000] Xen: 00000000fed20000 - 00000000fed90000 (reserved) >>>> [ 0.000000] Xen: 00000000feda0000 - 00000000feda6000 (reserved) >>>> [ 0.000000] Xen: 00000000fee00000 - 00000000fee10000 (reserved) >>>> [ 0.000000] Xen: 00000000ffe00000 - 0000000100000000 (reserved) >>>> [ 0.000000] NX (Execute Disable) protection: active >>>> >>>> [ 0.000000] user-defined physical RAM map: >>>> [ 0.000000] user: 0000000000000000 - 000000000009f000 (usable) - 1 >>>> [ 0.000000] user: 000000000009f000 - 0000000000100000 (reserved) - 2 >>>> [ 0.000000] user: 0000000000100000 - 000000002a000000 (usable) - 3 > > Ah, and this now punches the E820 with 2a000->bf65b as a "gap" and > it ends up being used by the PCI subsystem. > > That is the problem. So ... can you make sure you have that > hypervisor fix in and boot it without 'mem' and see what the E820 comes out > as? > > Thanks! >>>> [ 0.000000] user: 00000000bf65b800 - 00000000c0000000 (reserved) - 5 >>>> [ 0.000000] user: 00000000f8000000 - 00000000fc000000 (reserved) - 6 >>>> [ 0.000000] user: 00000000fec00000 - 00000000fec10000 (reserved) - 7 >>>> [ 0.000000] user: 00000000fed18000 - 00000000fed1c000 (reserved) - 8 >>>> [ 0.000000] user: 00000000fed20000 - 00000000fed90000 (reserved) - 9 >>>> [ 0.000000] user: 00000000feda0000 - 00000000feda6000 (reserved) - 10 >>>> [ 0.000000] user: 00000000fee00000 - 00000000fee10000 (reserved) - 11 >>>> [ 0.000000] user: 00000000ffe00000 - 0000000100000000 (reserved) - 12 >>>> >>>> On Jan 17, 2012, at 4:09 PM, Konrad Rzeszutek Wilk wrote: >>>> >>>>> On Tue, Jan 17, 2012 at 03:58:11PM -0500, Tom Goetz wrote: >>>>>> Konrad, >>>>>> >>>>>> We're seeing a crash on an Intel video Core2Duo. The crash looks similar >>>>>> to this one: >>>>>> http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1726. The last >>>>>> comment gives a commit ID for a fix. I don't find that commit in any of >>>>>> our trees. Do you know anything about this? >>>>> >>>>> Yes. It was 2f14ddc3a7146ea4cd5a3d1ecd993f85f2e4f948 >>>>> >>>>> but that was a fix in 2.6.39 (I think) and you are using 3.2. >>>>> >>>>> Which could be releated to the fact that in 3.2 the E820 code >>>>> (arch/x86/xen/setup.c) went through some surgery to make it easier. >>>>> >>>>> But the code in it looks like it handles it correctly. Hm, >>>>> any chance you can see what the Xen E820 looks in 3.2 vs anything >>>>> before v3.2? >>>>> >>>>>> >>>>>> Thanks for any help, >>>>>> >>>>>> Tom >>>>>> >>>>>> Dom0 mem was restricted to 672MB. The machine has 3GB. >>>>>> >>>>>> >>>>>> [ 2.463600] agpgart-intel 0000:00:00.0: Intel 965GM Chipset^M >>>>>> (XEN) mm.c:878:d0 Error getting mfn 30600 (pfn 5555555555555555) from L1 >>>>>> entry 8000000030600473 for l1e_owner=0, pg_owner=0 >>>>>> (XEN) mm.c:4664:d0 ptwr_emulate: could not get_page_from_l1e() >>>>>> [ 2.463891] BUG: unable to handle kernel paging request at >>>>>> ffff880023f28c30^M >>>>>> [ 2.463904] IP: [<ffffffff81008bee>] xen_set_pte_at+0x3e/0x210^M >>>>>> [ 2.463921] PGD 1a06067 PUD 1a0a067 PMD 209d067 PTE 8010000023f28065^M >>>>>> [ 2.463934] Oops: 0003 [#1] SMP ^M >>>>>> [ 2.463943] CPU 1 ^M >>>>>> [ 2.463946] Modules linked in: intel_agp(+) intel_gtt^M >>>>>> [ 2.463957] ^M >>>>>> [ 2.463961] Pid: 128, comm: modprobe Not tainted 3.2.1-orc #102 Dell >>>>>> Inc. Latitude D830 /0HN341^M >>>>>> [ 2.463974] RIP: e030:[<ffffffff81008bee>] [<ffffffff81008bee>] >>>>>> xen_set_pte_at+0x3e/0x210^M >>>>>> [ 2.463984] RSP: e02b:ffff880004b91ac8 EFLAGS: 00010297^M >>>>>> [ 2.463990] RAX: 0000000000000000 RBX: 8000000030600473 RCX: >>>>>> 8000000030600473^M >>>>>> [ 2.463996] RDX: 0000000000000000 RSI: ffffc90000186000 RDI: >>>>>> ffffffff81a38020^M >>>>>> [ 2.464002] RBP: ffff880004b91b18 R08: ffff880004d87d80 R09: >>>>>> 00000000000000d0^M >>>>>> [ 2.464009] R10: ffffe8ffffffffff R11: ffffc90000000000 R12: >>>>>> ffff880023f28c30^M >>>>>> [ 2.464015] R13: 0000000000030600 R14: ffff880023f28c30 R15: >>>>>> ffffc90000187000^M >>>>>> [ 2.464024] FS: 00007f11b34db720(0000) GS:ffff880029fd1000(0000) >>>>>> knlGS:0000000000000000^M >>>>>> [ 2.464031] CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b^M >>>>>> [ 2.464037] CR2: ffff880023f28c30 CR3: 0000000004bf1000 CR4: >>>>>> 0000000000002660^M >>>>>> [ 2.464044] DR0: 0000000000000000 DR1: 0000000000000000 DR2: >>>>>> 0000000000000000^M >>>>>> [ 2.464050] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: >>>>>> 0000000000000400^M >>>>>> [ 2.464057] Process modprobe (pid: 128, threadinfo ffff880004b90000, >>>>>> task ffff880004ac96b0)^M >>>>>> [ 2.464063] Stack:^M >>>>>> [ 2.464067] ffffc90000186000 ffffffff81a38020 ffffffff810051ed >>>>>> ffffc90000000000^M >>>>>> [ 2.464079] ffffe8ffffffffff ffffc90000186000 ffff880023f28c30 >>>>>> 0000000000030600^M >>>>>> [ 2.464091] 8000000000000573 ffffc90000187000 ffff880004b91bc8 >>>>>> ffffffff812b01e4^M >>>>>> [ 2.464104] Call Trace:^M >>>>>> [ 2.464111] [<ffffffff810051ed>] ? >>>>>> __raw_callee_save_xen_make_pte+0x11/0x1e^M >>>>>> [ 2.464121] [<ffffffff812b01e4>] ioremap_page_range+0x214/0x2f0^M >>>>>> [ 2.464130] [<ffffffff8113b6a2>] ? insert_vmalloc_vmlist+0x22/0x80^M >>>>>> [ 2.464140] [<ffffffff8103dc43>] __ioremap_caller+0x283/0x390^M >>>>>> [ 2.464149] [<ffffffffa000070a>] ? i9xx_setup+0x20a/0x2e0 [intel_gtt]^M >>>>>> [ 2.464158] [<ffffffff81579cee>] ? >>>>>> _raw_spin_unlock_irqrestore+0x1e/0x30^M >>>>>> [ 2.464166] [<ffffffff8103dda7>] ioremap_nocache+0x17/0x20^M >>>>>> [ 2.464173] [<ffffffffa000070a>] i9xx_setup+0x20a/0x2e0 [intel_gtt]^M >>>>>> [ 2.464181] [<ffffffffa0001739>] intel_gmch_probe+0x369/0xa08 >>>>>> [intel_gtt]^M >>>>>> [ 2.464190] [<ffffffffa0009e8a>] agp_intel_probe+0x48/0x19f [intel_agp]^M >>>>>> [ 2.464198] [<ffffffff812d794c>] local_pci_probe+0x5c/0xd0^M >>>>>> [ 2.464205] [<ffffffff812d9201>] pci_device_probe+0x101/0x120^M >>>>>> [ 2.464214] [<ffffffff81392f5e>] driver_probe_device+0x7e/0x1b0^M >>>>>> [ 2.464222] [<ffffffff8139313b>] __driver_attach+0xab/0xb0^M >>>>>> [ 2.464229] [<ffffffff81393090>] ? driver_probe_device+0x1b0/0x1b0^M >>>>>> [ 2.464236] [<ffffffff81393090>] ? driver_probe_device+0x1b0/0x1b0^M >>>>>> [ 2.464244] [<ffffffff81391f1c>] bus_for_each_dev+0x5c/0x90^M >>>>>> [ 2.464252] [<ffffffff81392bee>] driver_attach+0x1e/0x20^M >>>>>> [ 2.464259] [<ffffffff81392840>] bus_add_driver+0x1a0/0x270^M >>>>>> [ 2.464266] [<ffffffffa000d000>] ? 0xffffffffa000cfff^M >>>>>> [ 2.464273] [<ffffffff813936a6>] driver_register+0x76/0x140^M >>>>>> [ 2.464280] [<ffffffff8157d89d>] ? notifier_call_chain+0x4d/0x70^M >>>>>> [ 2.464287] [<ffffffffa000d000>] ? 0xffffffffa000cfff^M >>>>>> [ 2.464294] [<ffffffff812d8ed5>] __pci_register_driver+0x55/0xd0^M >>>>>> [ 2.464303] [<ffffffff81089173>] ? >>>>>> __blocking_notifier_call_chain+0x63/0x80^M >>>>>> [ 2.464312] [<ffffffffa000d02c>] agp_intel_init+0x2c/0x2e [intel_agp]^M >>>>>> [ 2.464320] [<ffffffff81002040>] do_one_initcall+0x40/0x180^M >>>>>> [ 2.464328] [<ffffffff810a0561>] sys_init_module+0x91/0x200^M >>>>>> [ 2.464336] [<ffffffff81581b02>] system_call_fastpath+0x16/0x1b^M >>>>>> [ 2.464341] Code: e8 4c 89 75 f0 4c 89 7d f8 66 66 66 66 90 48 89 7d b8 >>>>>> 48 89 75 b0 49 89 d6 48 89 cb 66 66 66 66 90 e8 57 1b 03 00 83 f8 01 74 >>>>>> 75 <49> 89 1e 48 8b 5d d8 4c 8b 65 e0 4c 8b 6d e8 4c 8b 75 f0 4c 8b ^M >>>>>> [ 2.464450] RIP [<ffffffff81008bee>] xen_set_pte_at+0x3e/0x210^M >>>>>> [ 2.464459] RSP <ffff880004b91ac8>^M >>>>>> [ 2.464463] CR2: ffff880023f28c30^M >>>>>> [ 2.464469] ---[ end trace 5223388e4a422cb4 ]---^M >>>>>> >>>>>> >>>>>> --- >>>>>> Tom Goetz >>>>>> tom.goetz@xxxxxxxxxxxxxxxxxxx >>>>>> >>>>>> >>>> >>>> --- >>>> Tom Goetz >>>> tom.goetz@xxxxxxxxxxxxxxxxxxx >>>> >>>> >>>> >> >> --- >> Tom Goetz >> tom.goetz@xxxxxxxxxxxxxxxxxxx >> >> >> _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |