Xen project Mailing List

[Xen-devel] [PATCH] x86/xen: Clear user %gs before updating segment descriptors

To: Andy Lutomirski <luto@xxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Juergen Gross <jgross@xxxxxxxx>, Thomas Gleixner <tglx@xxxxxxxxxxxxx>, <x86@xxxxxxxxxx>, xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>

From: David Woodhouse <dwmw@xxxxxxxxxxxx>

Date: Fri, 7 Dec 2018 22:15:45 +0000

Delivery-date: Fri, 07 Dec 2018 22:15:58 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

During a context switch, if clearing a descriptor which is currently referenced by the old process's user %gs, if Xen preempts the vCPU before %gs is set for the new process, a fault may occur. This fault actually seems to be fairly harmless; xen_failsafe_callback will just return to the "faulting" instruction (the random one after the vCPU was preempted) with the offending segment register zeroed, and then it'll get set again later during the context switch. But it's cleaner to avoid it. If the descriptor referenced by the %gs selector is being touched, then include a request to zero the user %gs in the multicall too. Signed-off-by: David Woodhouse <dwmw@xxxxxxxxxxxx> --- arch/x86/include/asm/xen/hypercall.h | 11 ++++++++ arch/x86/xen/enlighten_pv.c | 42 +++++++++++++++++++++------- 2 files changed, 43 insertions(+), 10 deletions(-) diff --git a/arch/x86/include/asm/xen/hypercall.h b/arch/x86/include/asm/xen/hypercall.h index ef05bea7010d..e8b383b24246 100644 --- a/arch/x86/include/asm/xen/hypercall.h +++ b/arch/x86/include/asm/xen/hypercall.h @@ -520,4 +520,15 @@ MULTI_stack_switch(struct multicall_entry *mcl, trace_xen_mc_entry(mcl, 2); } +static inline void +MULTI_set_segment_base(struct multicall_entry *mcl, + int reg, unsigned long value) +{ + mcl->op = __HYPERVISOR_set_segment_base; + mcl->args[0] = reg; + mcl->args[1] = value; + + trace_xen_mc_entry(mcl, 2); +} + #endif /* _ASM_X86_XEN_HYPERCALL_H */ diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c index 2f6787fc7106..e502d12ffd17 100644 --- a/arch/x86/xen/enlighten_pv.c +++ b/arch/x86/xen/enlighten_pv.c @@ -506,7 +506,7 @@ static inline bool desc_equal(const struct desc_struct *d1, } static void load_TLS_descriptor(struct thread_struct *t, - unsigned int cpu, unsigned int i) + unsigned int cpu, unsigned int i, int flush_gs) { struct desc_struct *shadow = &per_cpu(shadow_tls_desc, cpu).desc[i]; struct desc_struct *gdt; @@ -516,6 +516,17 @@ static void load_TLS_descriptor(struct thread_struct *t, if (desc_equal(shadow, &t->tls_array[i])) return; + /* + * If the current user %gs points to a descriptor we're changing, + * zero it first to avoid taking a fault if Xen preempts this + * vCPU between now and the time that %gs is later loaded with + * the new value. + */ + if ((flush_gs >> 3) == GDT_ENTRY_TLS_MIN + i) { + mc = __xen_mc_entry(0); + MULTI_set_segment_base(mc.mc, SEGBASE_GS_USER_SEL, 0); + } + *shadow = t->tls_array[i]; gdt = get_cpu_gdt_rw(cpu); @@ -527,6 +538,8 @@ static void load_TLS_descriptor(struct thread_struct *t, static void xen_load_tls(struct thread_struct *t, unsigned int cpu) { + u16 flush_gs = 0; + /* * XXX sleazy hack: If we're being called in a lazy-cpu zone * and lazy gs handling is enabled, it means we're in a @@ -537,27 +550,36 @@ static void xen_load_tls(struct thread_struct *t, unsigned int cpu) * This will go away as soon as Xen has been modified to not * save/restore %gs for normal hypercalls. * - * On x86_64, this hack is not used for %gs, because gs points - * to KERNEL_GS_BASE (and uses it for PDA references), so we - * must not zero %gs on x86_64 - * * For x86_64, we need to zero %fs, otherwise we may get an * exception between the new %fs descriptor being loaded and * %fs being effectively cleared at __switch_to(). + * + * We may also need to zero %gs, if it refers to a descriptor + * which we are clearing. Otherwise a Xen vCPU context switch + * before it gets reloaded could also cause a fault. Since + * clearing the user %gs is another hypercall, do that only if + * it's necessary. + * + * Note: These "faults" end up in xen_failsafe_callback(), + * which just returns immediately to the "faulting" instruction + * (i.e. the random one after Xen preempted this vCPU) with + * the offending segment register zeroed. Which is actually + * a perfectly safe thing to happen anyway, as it'll be loaded + * again shortly. So maybe we needn't bother? */ if (paravirt_get_lazy_mode() == PARAVIRT_LAZY_CPU) { #ifdef CONFIG_X86_32 lazy_load_gs(0); #else + savesegment(gs, flush_gs); + loadsegment(fs, 0); #endif } - xen_mc_batch(); - - load_TLS_descriptor(t, cpu, 0); - load_TLS_descriptor(t, cpu, 1); - load_TLS_descriptor(t, cpu, 2); + load_TLS_descriptor(t, cpu, 0, flush_gs); + load_TLS_descriptor(t, cpu, 1, flush_gs); + load_TLS_descriptor(t, cpu, 2, flush_gs); xen_mc_issue(PARAVIRT_LAZY_CPU); } -- 2.17.1 Amazon Development Centre (London) Ltd. Registered in England and Wales with registration number 04543232 with its registered office at 1 Principal Place, Worship Street, London EC2A 2FA, United Kingdom. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.