[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Xen-devel] Re: L1[0x1fb] = 0000000000000000 which faults on one type of machine but on another works?
On Wed, 2011-03-16 at 22:19 +0000, Konrad Rzeszutek Wilk wrote: > I am troubleshooting an issue where the Linux kernel tries > to dereference a not present entry. I have a fix for this > in for-2.6.32/bug-fixes .. but please read on. I'll give it a shot, I'll try anything at this point ;P > Specifically it tries to derefence the fixmapped value of > APIC_BASE. The fixmapped value of APIC_BASE is actually not set > due to git commit a1d8e2fa8325064338b2da1bcf0d7a0473883c284 > which adds this in arch/x86/kernel/acpi/boot.c: > > static void __init acpi_register_lapic_address(unsigned long address) > { > /* Xen dom0 doesn't have usable lapics */ > if (xen_initial_domain()) > return; > > mp_lapic_addr = address; > > set_fixmap_nocache(FIX_APIC_BASE, address); > > Later on we use 'native_apic_read' which tries to use the APIC_BASE as > address (it is present to be @ slot FIX_APIC_BASE of the fixmap > API) and it fails (on some machines). > > Since we don't call 'set_fixmap_nocache(FIX_APIC_BASE)' and > if one were to go through the pagetable this is what we get: > > > [ 0.000000] SMP: Allowing 1 CPUs, 0 hotplug CPUs > [ 0.000000] mapped APIC to ffffffffff5fb000 (00000000) > (XEN) d0:v0: unhandled page fault (ec=0000) > (XEN) Pagetable walk from ffffffffff5fb020: > (XEN) L4[0x1ff] = 0000000221003067 0000000000001003 > (XEN) L3[0x1ff] = 0000000221004067 0000000000001004 > (XEN) L2[0x1fa] = 0000000221771067 0000000000001771 > (XEN) L1[0x1fb] = 0000000000000000 ffffffffffffffff > (XEN) domain_crash_sync called from entry.S > (XEN) Domain 0 (vcpu#0) crashed on cpu#0: > (XEN) ----[ Xen-4.1-110309 x86_64 debug=y Tainted: C ]---- > (XEN) CPU: 0 > (XEN) RIP: e033:[<ffffffff8102b5d1>] > (XEN) RFLAGS: 0000000000000292 EM: 1 CONTEXT: pv guest > (XEN) rax: ffffffff8164cf50 rbx: 000000026ec00000 rcx: 00000000ffffdd85 > (XEN) rdx: 00000000ffffffff rsi: 0000000000000000 rdi: 0000000000000020 > (XEN) rbp: ffffffff81643ea8 rsp: ffffffff81643e50 r8: 0000000000000002 > (XEN) r9: 0000000000000000 r10: 0000000000000000 r11: 0000000000000000 > (XEN) r12: ffff880013671800 r13: 00000000bff66000 r14: ffffffffffffffff > (XEN) r15: 0000000000000000 cr0: 000000008005003b cr4: 00000000000006f0 > (XEN) cr3: 0000000221001000 cr2: ffffffffff5fb020 > (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: e02b cs: e033 > (XEN) Guest stack trace from rsp=ffffffff81643e50: > > Which is to say that the L1 has this: > 0000000115771fa0: 00000000 00000000 00000000 00000000 > 0000000115771fb0: 00000000 00000000 00000000 00000000 > 0000000115771fc0: 00000000 00000000 15770067 80100001 > 0000000115771fd0: 15770067 80100001 00000000 00000000 > 0000000115771fe0: 00000000 00000000 00000000 00000000 > 0000000115771ff0: 00000000 00000000 00000000 00000000 > > L1[0x1fb] is machine address 115771fd8, which has nothing in it. > > OK, so I've come up a fix that is a back-port of how 2.6.38 does it > which is that it removes the check I mentioned above and in xen_set_fixmap > we set the FIX_APIC_BASE to actually point to a dummy ioapic_mapping. > It is 7cb068cf1ba90425e12f3a7b3caed9d018fa9b8c in for-2.6.32/bug-fixes > > Gianni, you might want to check this out in case it fixes the problem you > are experiencing. Not sure, mine happens a lot earlier, sort of just after the very early memory initialisation. Also we're nowhere near trying to use APIC anything as an address afaict - just trying to reach the xen info page. The last thing I see is: [ 0.000000] kernel direct mapping tables up to 2f000000 @ 100000-27a000 [ 0.000000] init_memory_mapping: 0000000100000000-00000002a7000000 > But one thing I can't understand is why on one machine (IBM x3850) > I get this crash, while another one with the same pagetable contents > (L1 has nothing for 0x1fb) it works just fine? I added a panic and used > the Xen hypervisor kdb to manually inspect the pagetable, and it has > the same contents as the IBM x3850 -but it boots fine with this invalid value. > Any ideas? A missing TLB flush? heh > > FYI, seems another user (Sven SÃbert) IBM x3650 hits the same bug. And with > this fix he is able to boot. Very odd, if this isn't the bug I'm seeing it might be tangentially related. I'll let you know Gianni _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |