[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH] x86/fpu: CR0.TS should be set before trap into PV guest's #NM exception handler
On Wed, Nov 06, 2013 at 08:51:56AM +0000, Jan Beulich wrote: > >>> On 06.11.13 at 07:41, Zhu Yanhai <zhu.yanhai@xxxxxxxxx> wrote: > > As we know Intel X86's CR0.TS is a sticky bit, which means once set > > it remains set until cleared by some software routines, in other words, > > the exception handler expects the bit is set when it starts to execute. > > Since when would that be the case? CR0.TS is entirely unaffected > by exception invocations according to all I know. All that is known > here is that #NM wouldn't have occurred in the first place if CR0.TS > was clear. > > > However xen doesn't simulate this behavior quite well for PV guests - > > vcpu_restore_fpu_lazy() clears CR0.TS unconditionally in the very beginning, > > so the guest kernel's #NM handler runs with CR0.TS cleared. Generally > > speaking > > it's fine since the linux kernel executes the exception handler with > > interrupt disabled and a sane #NM handler will clear the bit anyway > > before it exits, but there's a catch: if it's the first FPU trap for the > > process, > > the linux kernel must allocate a piece of SLAB memory for it to save > > the FPU registers, which opens a schedule window as the memory > > allocation might sleep -- and with CR0.TS keeps clear! > > > > [see the code below in linux kernel, > > You're apparently referring to the pvops kernel. > > > void math_state_restore(void) > > { > > struct task_struct *tsk = current; > > > > if (!tsk_used_math(tsk)) { > > local_irq_enable(); > > /* > > * does a slab alloc which can sleep > > */ > > if (init_fpu(tsk)) { <<<< Here it might open a > > schedule window > > /* > > * ran out of memory! > > */ > > do_group_exit(SIGKILL); > > return; > > } > > local_irq_disable(); > > } > > > > __thread_fpu_begin(tsk); <<<< Here the process gets marked as a 'fpu > > user' > > after the schedule window > > > > /* > > * Paranoid restore. send a SIGSEGV if we fail to restore the state. > > */ > > if (unlikely(restore_fpu_checking(tsk))) { > > drop_init_fpu(tsk); > > force_sig(SIGSEGV, tsk); > > return; > > } > > > > tsk->fpu_counter++; > > } > > ] > > May I direct your attention to the XenoLinux one: > > asmlinkage void math_state_restore(void) > { > struct task_struct *me = current; > > /* NB. 'clts' is done for us by Xen during virtual trap. */ > __get_cpu_var(xen_x86_cr0) &= ~X86_CR0_TS; > if (!used_math()) > init_fpu(me); > restore_fpu_checking(&me->thread.i387.fxsave); > task_thread_info(me)->status |= TS_USEDFPU; > } > > Note the comment close to the beginning - the fact that CR0.TS > is clear at exception handler entry is actually part of the PV ABI, > i.e. by altering hypervisor behavior here you break all forward > ported kernels. > > Nevertheless I agree that there is an issue, but this needs to be > fixed on the Linux side (hence adding the Linux maintainers to Cc); > this issue was introduced way back in 2.6.26 (before that there > was no allocation on that path). It's not clear though whether > using GFP_ATOMIC for the allocation would be preferable over > stts() before calling the allocation function (and clts() if it > succeeded), or whether perhaps to defer the stts() until we > actually know the task is being switched out. It's going to be an > ugly, Xen-specific hack in any event. Was there ever a resolution to this problem? I never saw a comment from the Linux Xen PV maintainers. --msw _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |