Xen project Mailing List

Re: [Xen-devel] Xen PV domain regression with KASLR enabled (kernel 3.16)

To: Kees Cook <keescook@xxxxxxxxxxxx>, Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>

From: Stefan Bader <stefan.bader@xxxxxxxxxxxxx>

Date: Fri, 22 Aug 2014 11:20:50 +0200

Cc: "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>, David Vrabel <david.vrabel@xxxxxxxxxx>, Linux Kernel Mailing List <linux-kernel@xxxxxxxxxxxxxxx>

Delivery-date: Fri, 22 Aug 2014 09:21:23 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

On 21.08.2014 18:03, Kees Cook wrote: > On Tue, Aug 12, 2014 at 2:07 PM, Konrad Rzeszutek Wilk > <konrad.wilk@xxxxxxxxxx> wrote: >> On Tue, Aug 12, 2014 at 11:53:03AM -0700, Kees Cook wrote: >>> On Tue, Aug 12, 2014 at 11:05 AM, Stefan Bader >>> <stefan.bader@xxxxxxxxxxxxx> wrote: >>>> On 12.08.2014 19:28, Kees Cook wrote: >>>>> On Fri, Aug 8, 2014 at 7:35 AM, Stefan Bader <stefan.bader@xxxxxxxxxxxxx> >>>>> wrote: >>>>>> On 08.08.2014 14:43, David Vrabel wrote: >>>>>>> On 08/08/14 12:20, Stefan Bader wrote: >>>>>>>> Unfortunately I have not yet figured out why this happens, but can >>>>>>>> confirm by >>>>>>>> compiling with or without CONFIG_RANDOMIZE_BASE being set that without >>>>>>>> KASLR all >>>>>>>> is ok, but with it enabled there are issues (actually a dom0 does not >>>>>>>> even boot >>>>>>>> as a follow up error). >>>>>>>> >>>>>>>> Details can be seen in [1] but basically this is always some portion >>>>>>>> of a >>>>>>>> vmalloc allocation failing after hitting a freshly allocated PTE space >>>>>>>> not being >>>>>>>> PTE_NONE (usually from a module load triggered by systemd-udevd). In >>>>>>>> the >>>>>>>> non-dom0 case this repeats many times but ends in a guest that allows >>>>>>>> login. In >>>>>>>> the dom0 case there is a more fatal error at some point causing a >>>>>>>> crash. >>>>>>>> >>>>>>>> I have not tried this for a normal PV guest but for dom0 it also does >>>>>>>> not help >>>>>>>> to add "nokaslr" to the kernel command-line. >>>>>>> >>>>>>> Maybe it's overlapping with regions of the virtual address space >>>>>>> reserved for Xen? What the the VA that fails? >>>>>>> >>>>>>> David >>>>>>> >>>>>> Yeah, there is some code to avoid some regions of memory (like initrd). >>>>>> Maybe >>>>>> missing p2m tables? I probably need to add debugging to find the failing >>>>>> VA (iow >>>>>> not sure whether it might be somewhere in the stacktraces in the report). >>>>>> >>>>>> The kernel-command line does not seem to be looked at. It should put >>>>>> something >>>>>> into dmesg and that never shows up. Also today's random feature is other >>>>>> PV >>>>>> guests crashing after a bit somewhere in the check_for_corruption area... >>>>> >>>>> Right now, the kaslr code just deals with initrd, cmdline, etc. If >>>>> there are other reserved regions that aren't listed in the e820, it'll >>>>> need to locate and skip them. >>>>> >>>>> -Kees >>>>> >>>> Making my little steps towards more understanding I figured out that it >>>> isn't >>>> the code that does the relocation. Even with that completely disabled >>>> there were >>>> the vmalloc issues. What causes it seems to be the default of the upper >>>> limit >>>> and that this changes the split between kernel and modules to 1G+1G >>>> instead of >>>> 512M+1.5G. That is the reason why nokaslr has no effect. >>> >>> Oh! That's very interesting. There must be some assumption in Xen >>> about the kernel VM layout then? >> >> No. I think most of the changes that look at PTE and PMDs are are all >> in arch/x86/xen/mmu.c. I wonder if this is xen_cleanhighmap being >> too aggressive > > (Sorry I had to cut our chat short at Kernel Summit!) > > I sounded like there was another region of memory that Xen was setting > aside for page tables? But Stefan's investigation seems to show this > isn't about layout at boot (since the kaslr=0 case means no relocation > is done). Sounds more like the split between kernel and modules area, > so I'm not sure how the memory area after the initrd would be part of > this. What should next steps be, do you think? Maybe layout, but not about placement of the kernel. Basically leaving KASLR enabled but shrink the possible range back to the original kernel/module split is fine as well. I am bouncing between feeling close to understand to being confused. Konrad suggested xen_cleanhighmap being overly aggressive. But maybe its the other way round. The warning that occurs first indicates that PTE that was obtained for some vmalloc mapping is not unused (0) as it is expected. So it feels rather like some cleanup has *not* been done. Let me think aloud a bit... What seems to cause this, is the change of the kernel/module split from 512M:1.5G to 1G:1G (not exactly since there is 8M vsyscalls and 2M hole at the end). Which in vaddr terms means: Before: ffffffff80000000 - ffffffff9fffffff (=512 MB) kernel text mapping, from phys 0 ffffffffa0000000 - ffffffffff5fffff (=1526 MB) module mapping space After: ffffffff80000000 - ffffffffbfffffff (=1024 MB) kernel text mapping, from phys 0 ffffffffc0000000 - ffffffffff5fffff (=1014 MB) module mapping space Now, *if* I got this right, this means the kernel starts on a vaddr that is pointed at by: PGD[510]->PUD[510]->PMD[0]->PTE[0] In the old layout the module vaddr area would start in the same PUD area, but with the change the kernel would cover PUD[510] and the module vaddr + vsyscalls and the hole would cover PUD[511]. xen_cleanhighmap operates only on the kernel_level2_pgt which (speculating a bit since I am not sure I understand enough details) I believe is the one PMD pointed at by PGD[510]->PUD[510]. That could mean that before the change xen_cleanhighmap may touch some (the initial 512M) of the module vaddr space but not after the change. Maybe that also means it always should have covered more but this would not be observed as long as modules would not claim more than 512M? I still need to check the vaddr ranges for which xen_cleanhighmap is actually called. The modules vaddr space would normally not be touched (only with DEBUG set). I moved that to be unconditionally done but then this might be of no use when it needs to cover a different PMD... Really not sure here. But maybe a starter for others... -Stefan > > -Kees > > >>> >>> -Kees >>> >>> -- >>> Kees Cook >>> Chrome OS Security >>> >>> _______________________________________________ >>> Xen-devel mailing list >>> Xen-devel@xxxxxxxxxxxxx >>> http://lists.xen.org/xen-devel > > >

Attachment: signature.asc
Description: OpenPGP digital signature

_______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.