|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH] x86/fpu: CR0.TS should be set before trap into PV guest's #NM exception handler
On Wed, Nov 06, 2013 at 08:51:56AM +0000, Jan Beulich wrote:
> >>> On 06.11.13 at 07:41, Zhu Yanhai <zhu.yanhai@xxxxxxxxx> wrote:
> > As we know Intel X86's CR0.TS is a sticky bit, which means once set
> > it remains set until cleared by some software routines, in other words,
> > the exception handler expects the bit is set when it starts to execute.
>
> Since when would that be the case? CR0.TS is entirely unaffected
> by exception invocations according to all I know. All that is known
> here is that #NM wouldn't have occurred in the first place if CR0.TS
> was clear.
>
> > However xen doesn't simulate this behavior quite well for PV guests -
> > vcpu_restore_fpu_lazy() clears CR0.TS unconditionally in the very beginning,
> > so the guest kernel's #NM handler runs with CR0.TS cleared. Generally
> > speaking
> > it's fine since the linux kernel executes the exception handler with
> > interrupt disabled and a sane #NM handler will clear the bit anyway
> > before it exits, but there's a catch: if it's the first FPU trap for the
> > process,
> > the linux kernel must allocate a piece of SLAB memory for it to save
> > the FPU registers, which opens a schedule window as the memory
> > allocation might sleep -- and with CR0.TS keeps clear!
> >
> > [see the code below in linux kernel,
>
> You're apparently referring to the pvops kernel.
>
> > void math_state_restore(void)
> > {
> > struct task_struct *tsk = current;
> >
> > if (!tsk_used_math(tsk)) {
> > local_irq_enable();
> > /*
> > * does a slab alloc which can sleep
> > */
> > if (init_fpu(tsk)) { <<<< Here it might open a
> > schedule window
> > /*
> > * ran out of memory!
> > */
> > do_group_exit(SIGKILL);
> > return;
> > }
> > local_irq_disable();
> > }
> >
> > __thread_fpu_begin(tsk); <<<< Here the process gets marked as a 'fpu
> > user'
> > after the schedule window
> >
> > /*
> > * Paranoid restore. send a SIGSEGV if we fail to restore the state.
> > */
> > if (unlikely(restore_fpu_checking(tsk))) {
> > drop_init_fpu(tsk);
> > force_sig(SIGSEGV, tsk);
> > return;
> > }
> >
> > tsk->fpu_counter++;
> > }
> > ]
>
> May I direct your attention to the XenoLinux one:
>
> asmlinkage void math_state_restore(void)
> {
> struct task_struct *me = current;
>
> /* NB. 'clts' is done for us by Xen during virtual trap. */
> __get_cpu_var(xen_x86_cr0) &= ~X86_CR0_TS;
> if (!used_math())
> init_fpu(me);
> restore_fpu_checking(&me->thread.i387.fxsave);
> task_thread_info(me)->status |= TS_USEDFPU;
> }
>
> Note the comment close to the beginning - the fact that CR0.TS
> is clear at exception handler entry is actually part of the PV ABI,
> i.e. by altering hypervisor behavior here you break all forward
> ported kernels.
>
> Nevertheless I agree that there is an issue, but this needs to be
> fixed on the Linux side (hence adding the Linux maintainers to Cc);
> this issue was introduced way back in 2.6.26 (before that there
> was no allocation on that path). It's not clear though whether
> using GFP_ATOMIC for the allocation would be preferable over
> stts() before calling the allocation function (and clts() if it
> succeeded), or whether perhaps to defer the stts() until we
> actually know the task is being switched out. It's going to be an
> ugly, Xen-specific hack in any event.
Was there ever a resolution to this problem? I never saw a comment
from the Linux Xen PV maintainers.
--msw
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |