[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v2 1/2] x86/entry/32: Introduce and use X86_BUG_ESPFIX instead of paravirt_enabled



On Wed, Mar 02, 2016 at 04:33:06PM -0800, Andy Lutomirski wrote:
> On Tue, Mar 1, 2016 at 4:15 PM, Luis R. Rodriguez <mcgrof@xxxxxxxxxx> wrote:
> > Ingo, your feedback appreciated at the end here, regarding quirks.
> >
> > On Tue, Mar 01, 2016 at 09:00:53AM -0500, Boris Ostrovsky wrote:
> >> On 02/29/2016 06:50 PM, Andy Lutomirski wrote:
> >> >diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
> >> >index 91ddae732a36..c6ef4da8e4f4 100644
> >> >--- a/arch/x86/kernel/cpu/common.c
> >> >+++ b/arch/x86/kernel/cpu/common.c
> >> >@@ -979,6 +979,31 @@ static void identify_cpu(struct cpuinfo_x86 *c)
> >
> > Note: Andy's change is on identify_cpu() modification here at the end.
> >
> >> >  #ifdef CONFIG_NUMA
> >> >     numa_add_cpu(smp_processor_id());
> >> >  #endif
> >> >+
> >> >+    /*
> >> >+     * ESPFIX is a strange bug.  All real CPUs have it.  Paravirt
> >> >+     * systems that run Linux at CPL > 0 may or may not have the
> >> >+     * issue, but, even if they have the issue, there's absolutely
> >> >+     * nothing we can do about it because we can't use the real IRET
> >> >+     * instruction.
> >> >+     *
> >> >+     * NB: For the time being, only 32-bit kernels support
> >> >+     * X86_BUG_ESPFIX as such.  64-bit kernels directly choose
> >> >+     * whether to apply espfix using paravirt hooks.  If any
> >> >+     * non-paravirt system ever shows up that does *not* have the
> >> >+     * ESPFIX issue, we can change this.
> >> >+     */
> >> >+#ifdef CONFIG_X86_32
> >> >+#ifdef CONFIG_PARAVIRT
> >> >+    do {
> >> >+            extern void native_iret(void);
> >> >+            if (pv_cpu_ops.iret == native_iret)
> >> >+                    set_cpu_bug(c, X86_BUG_ESPFIX);
> >> >+    } while (0);
> >> >+#else
> >> >+    set_cpu_bug(c, X86_BUG_ESPFIX);
> >> >+#endif
> >> >+#endif
> >> >  }
> >> >  /*
> >>
> >> Alternatively, PV guests can clear X86_BUG_ESPFIX in their init
> >> code. E.g in .set_cpu_features op, just like we do for
> >> X86_BUG_SYSRET_SS_ATTRS
> >
> > Andy's proposal works out of identify_cpu() and that covers both boot
> > processor and secondary CPUs. The summary is as follows, tracing back in
> > time from left to right.
> >
> >                 --- identify_boot_cpu() --- check_bugs() --- start_kernel()
> >                /
> > identify_cpu()<
> >                \
> >                 --- identify_secondary_cpu() --- cpu_up() --- smp_init()
> >                     --- kernel_init_freeable() --- kernel_init()
> >                         --- rest_init() --- start_kernel()
> >
> >
> > set_cpu_features() is called from both: init_hypervisor_platform()
> > during setup_arch() and identify_cpu(). Since it'll be called on
> > check_bugs() already on identify_boot_cpu() though I think the
> > call from init_hypervisor_platform() seems redundant ?
> >
> > We assume we just call:
> >
> > setup_arch() --> init_hypervisor_platform() --> 
> > init_hypervisor(&boot_cpu_data)
> >
> > But the above map on identify_cpu() also shows we call:
> >
> > start_kernel --> check_bugs() --> identify_boot_cpu() -->
> >         identify_cpu() --> init_hypervisor() --> set_cpu_features()
> >
> >
> > void init_hypervisor(struct cpuinfo_x86 *c)
> > {
> >         if (x86_hyper && x86_hyper->set_cpu_features)
> >                 x86_hyper->set_cpu_features(c);
> > }
> >
> > Anyway, since we're consolidating quirks, and since it turns out the other
> > quirks are being shifted away from paravirt_enabled() out into another 
> > struct
> > x86_platform_ops CPU specific quirk, I wonder why not just also replace this
> > set_cpu_features() thing as a struct x86_platform_ops quirk CPU callback.
> >
> >> (although this may require adding struct
> >> hypervisor_x86 for lguests. Which I think they should have anyway).
> >
> > lguest already uses x86_platform, and setting up a per CPU quirk would
> > be rather trivial.
> >
> > CPU feature / CPU quirk...
> >
> > I've stashed the other quirks into a x86_early_init_platform_quirks(),
> > this was to have all quirks handled in one place. We handle differences
> > with subarch there. vmware has no subarch though, and it uses its own
> > set_cpu_features(). We have a few options I  can think of:
> >
> >  1) keep this on set_cpu_features() and modify lguest to add a struct 
> > hypervisor_x86
> >     as boris suggests
> >
> >  2) move set_cpu_features() as a platform feature / quirk callback and
> >     call it on identify_cpu()
> >
> >  3) Just identify each quirk on struct x86_platform, with a set of defaults
> >     set. Then identify_cpu() enables a platform  callback to override
> >     defaults, and finally then a shared quirk call is issued to
> >     set the different set_cpu_features() or clear them.
> >
> 
> I think this is severely overcomplicating the issue.
> 
> The issue is that IRET is a pile of shit.  It may be quirky, but it
> affects *everything*.
> 
> On x86_64, the kernel assumes that the "iret" implementation works
> around the quirk.  xen_iret doesn't, and that's Xen's problem.
>
> On x86_32, it's inconvenient for native_iret to directly work around
> the quirk.  Instead, some other asm code in the exit path sets up the
> workaround under the assumption that native_iret is just plain IRET.
> It's the responsibility of other IRET implementations to have their
> own implementations of the workaround and, again, they don't in
> practice and this is Xen and lguest's problem.
> 
> So I think the condition we want really is (iret == native_iret), and
> that's what my patch does.  So I think we should leave it at that.

OK great, I'd much prefer that, in particular as I could not find any
other obvious CPU "quirk" to really fold this as a platform quirk and
I really did not think this was enough to justify having lguest have
get a new hypervisor struct added. That just seemed overkill.

  Luis

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.