[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v4 0/3] x86: modify_ldt improvement, test, and config option
On Thu, Jul 30, 2015 at 1:01 PM, Boris Ostrovsky <boris.ostrovsky@xxxxxxxxxx> wrote: > On 07/30/2015 02:54 PM, Andrew Cooper wrote: >> >> On 30/07/15 19:30, Andy Lutomirski wrote: >>> >>> On Wed, Jul 29, 2015 at 5:29 PM, Andrew Cooper >>> <andrew.cooper3@xxxxxxxxxx> wrote: >>>> >>>> On 30/07/2015 00:13, Andy Lutomirski wrote: >>>>> >>>>> On Wed, Jul 29, 2015 at 4:02 PM, Andrew Cooper >>>>> <andrew.cooper3@xxxxxxxxxx> wrote: >>>>>> >>>>>> On 29/07/2015 23:49, Boris Ostrovsky wrote: >>>>>>> >>>>>>> On 07/29/2015 06:46 PM, David Vrabel wrote: >>>>>>>> >>>>>>>> On 29/07/2015 23:11, Andrew Cooper wrote: >>>>>>>>> >>>>>>>>> On 29/07/2015 23:05, Andy Lutomirski wrote: >>>>>>>>>> >>>>>>>>>> On Wed, Jul 29, 2015 at 2:37 PM, Andrew Cooper >>>>>>>>>> <andrew.cooper3@xxxxxxxxxx> wrote: >>>>>>>>>>> >>>>>>>>>>> On 29/07/2015 22:26, Andy Lutomirski wrote: >>>>>>>>>>>> >>>>>>>>>>>> On Wed, Jul 29, 2015 at 2:23 PM, Boris Ostrovsky >>>>>>>>>>>> <boris.ostrovsky@xxxxxxxxxx> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> On 07/29/2015 03:03 PM, Andrew Cooper wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> On 29/07/15 15:43, Boris Ostrovsky wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> FYI, I have got a repro now and am investigating. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Good and bad news. This bug has nothing to do with LDTs >>>>>>>>>>>>>> themselves. >>>>>>>>>>>>>> >>>>>>>>>>>>>> I have worked out what is going on, but this: >>>>>>>>>>>>>> >>>>>>>>>>>>>> diff --git a/arch/x86/xen/enlighten.c >>>>>>>>>>>>>> b/arch/x86/xen/enlighten.c >>>>>>>>>>>>>> index 5abeaac..7e1a82e 100644 >>>>>>>>>>>>>> --- a/arch/x86/xen/enlighten.c >>>>>>>>>>>>>> +++ b/arch/x86/xen/enlighten.c >>>>>>>>>>>>>> @@ -493,6 +493,7 @@ static void set_aliased_prot(void *v, >>>>>>>>>>>>>> pgprot_t prot) >>>>>>>>>>>>>> pte = pfn_pte(pfn, prot); >>>>>>>>>>>>>> + (void)*(volatile int*)v; >>>>>>>>>>>>>> if (HYPERVISOR_update_va_mapping((unsigned long)v, >>>>>>>>>>>>>> pte, 0)) { >>>>>>>>>>>>>> pr_err("set_aliased_prot va update failed >>>>>>>>>>>>>> w/ >>>>>>>>>>>>>> lazy mode >>>>>>>>>>>>>> %u\n", paravirt_get_lazy_mode()); >>>>>>>>>>>>>> BUG(); >>>>>>>>>>>>>> >>>>>>>>>>>>>> Is perhaps not the fix we are looking for, and every use of >>>>>>>>>>>>>> HYPERVISOR_update_va_mapping() is susceptible to the same >>>>>>>>>>>>>> problem. >>>>>>>>>>>>> >>>>>>>>>>>>> I think in most cases we know that page is mapped so hopefully >>>>>>>>>>>>> this is the >>>>>>>>>>>>> only site that we need to be careful about. >>>>>>>>>>>> >>>>>>>>>>>> Is there any chance we can get some kind of quick-and-dirty fix >>>>>>>>>>>> that >>>>>>>>>>>> can go to x86/urgent in the next few days even if a clean fix >>>>>>>>>>>> isn't >>>>>>>>>>>> available yet? >>>>>>>>>>> >>>>>>>>>>> Quick and dirty? >>>>>>>>>>> >>>>>>>>>>> Reading from v is the most obvious and quick way, for areas where >>>>>>>>>>> we are >>>>>>>>>>> certain v exists, is kernel memory and is expected to have a >>>>>>>>>>> backing >>>>>>>>>>> page. I don't know offhand how many of current >>>>>>>>>>> HYPERVISOR_update_va_mapping() callsites this applies to. >>>>>>>>>> >>>>>>>>>> __get_user((char *)v, tmp), perhaps, unless there's something >>>>>>>>>> better >>>>>>>>>> in the wings. Keep in mind that we need this for -stable, and >>>>>>>>>> it's >>>>>>>>>> likely to get backported quite quickly due to CVE-2015-5157. >>>>>>>>> >>>>>>>>> Hmm - something like that tucked inside >>>>>>>>> HYPERVISOR_update_va_mapping() >>>>>>>>> would probably work, and certainly be minimal hassle for -stable. >>>>>>>>> >>>>>>>>> Altering the hypercall used is certainly not something to backport, >>>>>>>>> nor >>>>>>>>> are we sure it is a viable fix at this time. >>>>>>>> >>>>>>>> Changing this one use of update_va_mapping to use >>>>>>>> mmu_update_normal_pt >>>>>>>> is the correct fix to unblock this LDT series. I see no reason why >>>>>>>> this >>>>>>>> cannot be backported. >>>>>>> >>>>>>> To properly fix it should include batching and that is not something >>>>>>> that I think we should target for stable. >>>>>> >>>>>> Batching is absolutely not necessary to alter update_va_mapping to >>>>>> mmu_update_normal_pt. After all, update_va_mapping isn't batched. >>>>>> >>>>>> However this isn't the first issue issue we have had lazy mmu >>>>>> faulting, >>>>>> and I doubt it is the last. There are not many callsites of >>>>>> update_va_mapping - I will audit them tomorrow and see if any similar >>>>>> issues are lurking elsewhere. >>>>> >>>>> One thing I should add: nothing flushes old aliases in xen_alloc_ldt, >>>>> yet I haven't been able to get xen_alloc_ldt to fail or subsequent LDT >>>>> access to fault. Is this something we should be worried about? >>>> >>>> Yes. update_va_mapping() will function perfectly well taking one RW >>>> mapping to RO even if there is a second RW mapping. In such a case, the >>>> next LDT access will fault. >>> >>> Which is a problem because that alias might still exist, and also >>> because Linux really doesn't expect that fault. >>> >>>> On closer inspection, Xen is rather unhelpful with the fault. Xen's >>>> lazy #PF will be bounced back to the guest with cr2 adjusted to appear >>>> in the range passed to set_ldt(). The error code however will be >>>> unmodified (and limited only by not-user and not-reserved), so will >>>> appear as a non-present read or write supervisor access to an address >>>> which the kernel has a valid read mapping of. >>> >>> More yuck. >>> >>> I think I'm just going to stick an unconditional vm_flush_aliases in >>> alloc_ldt. >>> >>>> Therefore, set_ldt() needs to be confident that there are no writeable >>>> mappings to the frames used to make up the LDT. It could proactively >>>> fault them in by accessing one descriptor in each page inside the limit, >>>> but by the time a fault is received it is probably too late to work out >>>> where the other mapping is which prevented the typechange (or indeed, >>>> whether Xen objected to one of the descriptors instead). >>> >>> This seems like overkill. >>> >>> I'm still a bit confused, though: the failure is in xen_free_ldt. How >>> do we make it all the way to xen_free_ldt without the vmapped page >>> existing in the guest's page tables? After all, we had to survive >>> xen_alloc_ldt first, and ISTM that should fail in exactly the same >>> way. >> >> (Summarising part of a discussion which has just occurred on IRC) >> >> I presume that xen_free_ldt() is called while in the context of an mm >> which doesn't have the particular area of the vmalloc() space faulted in. > > > This is exactly what's happening --- the bug is only triggered during exit > and xen_free_ldt() is called from someone else's context, e.g.: > > [ 53.986677] Call Trace: > [ 53.986677] [<c105312d>] xen_free_ldt+0x2d/0x40 > [ 53.986677] [<c1062310>] free_ldt_struct.part.1+0x10/0x40 > [ 53.986677] [<c1062735>] destroy_context+0x25/0x40 > [ 53.986677] [<c10a764e>] __mmdrop+0x1e/0xc0 > [ 53.986677] [<c10c9858>] finish_task_switch+0xd8/0x1a0 > [ 53.986677] [<c1863736>] __schedule+0x316/0x950 > [ 53.986677] [<c1863d96>] schedule+0x26/0x70 > [ 53.986677] [<c10ac613>] do_wait+0x1b3/0x200 > [ 53.986677] [<c10ac9d7>] SyS_waitpid+0x67/0xd0 > [ 53.986677] [<c10aa820>] ? task_stopped_code+0x50/0x50 > [ 53.986677] [<c186717a>] syscall_call+0x7/0x7 > > But that would imply that this other context has mm->context.ldt of > ldt_gdt_32. How is that possible? > It's freed via destroy_context, which destroys someone else's LDT, right? --Andy _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |