[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Kernel panic with tboot E820_UNUSABLE region
On 14/05/13 14:40, Konrad Rzeszutek Wilk wrote: > On Tue, May 14, 2013 at 12:06:16PM +0100, Aurelien Chartier wrote: >> Hi everybody, >> >> We noticed a crash in Linux dom0 early boot sequence when running over >> tboot and Xen. The issue seemed related with a E820 region that tboot is >> setting as E820_UNUSABLE. We posted to tboot-devel to understand better >> what could be the cause of the kernel panic. This thread can be read >> here : >> http://sourceforge.net/mailarchive/forum.php?thread_name=51852B26.7070406%40citrix.com&forum_name=tboot-devel >> >> Following Konrad's advice, we took a closer look at arch/x86/xen/setup.c >> and found what could be the cause of the kernel panic. I am not familiar >> with that part of Xen, so feel free to correct me. >> >> The Xen memory setup code called during early boot is trying to release >> chunks of memory in xen_set_identity_and_release for non-RAM regions >> (including E820_UNUSABLE). The xen_set_identity_and_release_chunk >> function is calling HYPERVISOR_update_va_mapping, which will fail in our >> case. As tboot marked that region as being unusable, Xen did not map >> those pages and the later call on get_page_from_l1e (arch/x86/mm.c in >> Xen code) is returning an error. As the return value of the hypercall >> is not checked in Linux code, xen_set_identity_and_release_chunk >> function is carrying on and tries to release the E820_UNUSABLE chunk. >> This is apparently messing up some Xen internal memory structures, >> resulting in a kernel crash when Linux is initializing its memory mapping. >> >> A possible fix I have tried is to check the return value of >> HYPERVISOR_update_va_mapping and if encountering an error, exit from >> xen_set_identity_and_release_chunk. This is fixing the kernel panic, but >> I am not sure about other implications by that change. > The implications are explained in the git commit that added that: > > 83d51ab47 > > xen/setup: update VA mapping when releasing memory during setup > > In xen_memory_setup(), if a page that is being released has a VA > mapping this must also be updated. Otherwise, the page will be not > released completely -- it will still be referenced in Xen and won't be > freed util the mapping is removed and this prevents it from being > reallocated at a different PFN. > > This was already being done for the ISA memory region in > xen_ident_map_ISA() but on many systems this was omitting a few pages > as many systems marked a few pages below the ISA memory region as > reserved in the e820 map. > > This fixes errors such as: > > (XEN) page_alloc.c:1148:d0 Over-allocation for domain 0: 2097153 > 2097152 > (XEN) memory.c:133:d0 Could not allocate order=0 extent: id=0 memflags=0 > (0 of 17) > > So I think it would be OK to continue with the rest of the function, so > something > like this: > > diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c > index 94eac5c..f07ca54 100644 > --- a/arch/x86/xen/setup.c > +++ b/arch/x86/xen/setup.c > @@ -219,9 +219,10 @@ static void __init xen_set_identity_and_release_chunk( > * to be updated to be 1:1. > */ > for (pfn = start_pfn; pfn <= max_pfn_mapped && pfn < end_pfn; pfn++) > - (void)HYPERVISOR_update_va_mapping( > + if (HYPERVISOR_update_va_mapping( > (unsigned long)__va(pfn << PAGE_SHIFT), > - mfn_pte(pfn, PAGE_KERNEL_IO), 0); > + mfn_pte(pfn, PAGE_KERNEL_IO), 0)) > + break; > > if (start_pfn < nr_pages) > *released += xen_release_chunk( > > But that looks like a hack as the issue seems to be with the hypervisor? The issue is triggered by the call to xen_release_chunk, not the update_va_mapping hypercall. So, I was using a return rather than a break. This is probably a hack, I just mentioned it to stress the fact that the kernel crash was caused by the call to xen_release_chunk (and the decrease_reservation hypercall to Xen I assume). >> Any ideas about this issue ? > Fix the bug in the hypervisor? > I was wondering how much xen_release_chunk was relying on the fact that the previous hypercalls have succeeded. If I understood it correctly, this is not the case, so there may be a way to make the rest of the code more robust. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |