[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] [PATCH] x86/xen: Clear user %gs before updating segment descriptors



During a context switch, if clearing a descriptor which is currently
referenced by the old process's user %gs, if Xen preempts the vCPU
before %gs is set for the new process, a fault may occur.

This fault actually seems to be fairly harmless; xen_failsafe_callback
will just return to the "faulting" instruction (the random one after the
vCPU was preempted) with the offending segment register zeroed, and then
it'll get set again later during the context switch. But it's cleaner
to avoid it.

If the descriptor referenced by the %gs selector is being touched,
then include a request to zero the user %gs in the multicall too.

Signed-off-by: David Woodhouse <dwmw@xxxxxxxxxxxx>
---
 arch/x86/include/asm/xen/hypercall.h | 11 ++++++++
 arch/x86/xen/enlighten_pv.c          | 42 +++++++++++++++++++++-------
 2 files changed, 43 insertions(+), 10 deletions(-)

diff --git a/arch/x86/include/asm/xen/hypercall.h 
b/arch/x86/include/asm/xen/hypercall.h
index ef05bea7010d..e8b383b24246 100644
--- a/arch/x86/include/asm/xen/hypercall.h
+++ b/arch/x86/include/asm/xen/hypercall.h
@@ -520,4 +520,15 @@ MULTI_stack_switch(struct multicall_entry *mcl,
        trace_xen_mc_entry(mcl, 2);
 }
 
+static inline void
+MULTI_set_segment_base(struct multicall_entry *mcl,
+                      int reg, unsigned long value)
+{
+       mcl->op = __HYPERVISOR_set_segment_base;
+       mcl->args[0] = reg;
+       mcl->args[1] = value;
+
+       trace_xen_mc_entry(mcl, 2);
+}
+
 #endif /* _ASM_X86_XEN_HYPERCALL_H */
diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c
index 2f6787fc7106..e502d12ffd17 100644
--- a/arch/x86/xen/enlighten_pv.c
+++ b/arch/x86/xen/enlighten_pv.c
@@ -506,7 +506,7 @@ static inline bool desc_equal(const struct desc_struct *d1,
 }
 
 static void load_TLS_descriptor(struct thread_struct *t,
-                               unsigned int cpu, unsigned int i)
+                               unsigned int cpu, unsigned int i, int flush_gs)
 {
        struct desc_struct *shadow = &per_cpu(shadow_tls_desc, cpu).desc[i];
        struct desc_struct *gdt;
@@ -516,6 +516,17 @@ static void load_TLS_descriptor(struct thread_struct *t,
        if (desc_equal(shadow, &t->tls_array[i]))
                return;
 
+       /*
+        * If the current user %gs points to a descriptor we're changing,
+        * zero it first to avoid taking a fault if Xen preempts this
+        * vCPU between now and the time that %gs is later loaded with
+        * the new value.
+        */
+       if ((flush_gs >> 3) == GDT_ENTRY_TLS_MIN + i) {
+               mc = __xen_mc_entry(0);
+               MULTI_set_segment_base(mc.mc, SEGBASE_GS_USER_SEL, 0);
+       }
+
        *shadow = t->tls_array[i];
 
        gdt = get_cpu_gdt_rw(cpu);
@@ -527,6 +538,8 @@ static void load_TLS_descriptor(struct thread_struct *t,
 
 static void xen_load_tls(struct thread_struct *t, unsigned int cpu)
 {
+       u16 flush_gs = 0;
+
        /*
         * XXX sleazy hack: If we're being called in a lazy-cpu zone
         * and lazy gs handling is enabled, it means we're in a
@@ -537,27 +550,36 @@ static void xen_load_tls(struct thread_struct *t, 
unsigned int cpu)
         * This will go away as soon as Xen has been modified to not
         * save/restore %gs for normal hypercalls.
         *
-        * On x86_64, this hack is not used for %gs, because gs points
-        * to KERNEL_GS_BASE (and uses it for PDA references), so we
-        * must not zero %gs on x86_64
-        *
         * For x86_64, we need to zero %fs, otherwise we may get an
         * exception between the new %fs descriptor being loaded and
         * %fs being effectively cleared at __switch_to().
+        *
+        * We may also need to zero %gs, if it refers to a descriptor
+        * which we are clearing. Otherwise a Xen vCPU context switch
+        * before it gets reloaded could also cause a fault. Since
+        * clearing the user %gs is another hypercall, do that only if
+        * it's necessary.
+        *
+        * Note: These "faults" end up in xen_failsafe_callback(),
+        * which just returns immediately to the "faulting" instruction
+        * (i.e. the random one after Xen preempted this vCPU) with
+        * the offending segment register zeroed. Which is actually
+        * a perfectly safe thing to happen anyway, as it'll be loaded
+        * again shortly. So maybe we needn't bother?
         */
        if (paravirt_get_lazy_mode() == PARAVIRT_LAZY_CPU) {
 #ifdef CONFIG_X86_32
                lazy_load_gs(0);
 #else
+               savesegment(gs, flush_gs);
+
                loadsegment(fs, 0);
 #endif
        }
 
-       xen_mc_batch();
-
-       load_TLS_descriptor(t, cpu, 0);
-       load_TLS_descriptor(t, cpu, 1);
-       load_TLS_descriptor(t, cpu, 2);
+       load_TLS_descriptor(t, cpu, 0, flush_gs);
+       load_TLS_descriptor(t, cpu, 1, flush_gs);
+       load_TLS_descriptor(t, cpu, 2, flush_gs);
 
        xen_mc_issue(PARAVIRT_LAZY_CPU);
 }
-- 
2.17.1




Amazon Development Centre (London) Ltd. Registered in England and Wales with 
registration number 04543232 with its registered office at 1 Principal Place, 
Worship Street, London EC2A 2FA, United Kingdom.




_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.