[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH] x86/fpu: CR0.TS should be set before trap into PV guest's #NM exception handler
create ^ title it linux pvops: fpu corruption due to incorrect assumptions about TS bit after exception under Xen thanks On Wed, Nov 6, 2013 at 6:41 AM, Zhu Yanhai <zhu.yanhai@xxxxxxxxx> wrote: > As we know Intel X86's CR0.TS is a sticky bit, which means once set > it remains set until cleared by some software routines, in other words, > the exception handler expects the bit is set when it starts to execute. > > However xen doesn't simulate this behavior quite well for PV guests - > vcpu_restore_fpu_lazy() clears CR0.TS unconditionally in the very beginning, > so the guest kernel's #NM handler runs with CR0.TS cleared. Generally speaking > it's fine since the linux kernel executes the exception handler with > interrupt disabled and a sane #NM handler will clear the bit anyway > before it exits, but there's a catch: if it's the first FPU trap for the > process, > the linux kernel must allocate a piece of SLAB memory for it to save > the FPU registers, which opens a schedule window as the memory > allocation might sleep -- and with CR0.TS keeps clear! > > [see the code below in linux kernel, > > void math_state_restore(void) > { > struct task_struct *tsk = current; > > if (!tsk_used_math(tsk)) { > local_irq_enable(); > /* > * does a slab alloc which can sleep > */ > if (init_fpu(tsk)) { <<<< Here it might open a > schedule window > /* > * ran out of memory! > */ > do_group_exit(SIGKILL); > return; > } > local_irq_disable(); > } > > __thread_fpu_begin(tsk); <<<< Here the process gets marked as a 'fpu > user' > after the schedule window > > /* > * Paranoid restore. send a SIGSEGV if we fail to restore the state. > */ > if (unlikely(restore_fpu_checking(tsk))) { > drop_init_fpu(tsk); > force_sig(SIGSEGV, tsk); > return; > } > > tsk->fpu_counter++; > } > ] > > The check in linux kernel's switch_fpu_prepare() doesn't stts() either because > the current process only gets marked as a FPU user after the schedule window > (the story is a bit different if eagerfpu is enabled, anyway a sane hypervisor > cannot depend on such undetermined things). And then supposing that the new > process scheduled-in wants to touch FPU, nobody will do fxrstor/frstor for it > anymore, > conducing to a serious data damage. > > Also, The point is everything is fine on linux + baremetal since CR0.TS will > keep set until the interrupted #NM handler got the memory it needs and exits, > so the incomer FPU user will get trapped as it's supposed to be. > > The test case is as below, > > buf = malloc(BUF_SIZE); > if (!buf) { > fprintf(stderr, "error %s during %s\n", > strerror(-err), > "malloc"); > return 1; > } > memset(buf, IO_PATTERN, BUF_SIZE); > memset(cmp_buf, IO_PATTERN, BUF_SIZE); > > if (memcmp(buf, cmp_buf, BUF_SIZE)) { > unsigned long long *ubuf = (unsigned long long *)buf; > int i; > > for (i = 0; i < BUF_SIZE / sizeof(unsigned long long); i++) > printf("%d: 0x%llx\n", i, ubuf[i]); > > return 2; > } > > Two shell scripts on each box's dom0 runs above program repeatedly until > the compare fails (so every time the C program is a new fpu user and triggers > memory allocation). we can see the data damage at least once with > xen 4.3 + linux 2.6.32 on ~200 physical machines within two hours. > With xen 4.3 + linux 3.11.6 stable it becomes harder to reproduce > (guess it's because of the eagerfpu feature introduced in linux kernel 3.7) > but it's still possible to come out within about four hours. > > The fix here is trying to make xen behave as close to the hardware as > possible. > > This bug only has effects on PV guests (and including dom0 kernel of course). > > Cc: Wan Jia <jia.wanj@xxxxxxxxxxxxxxx> > Cc: Shen Yiben <zituan@xxxxxxxxxx> > Cc: Charles Wang <muming.wq@xxxxxxxxxx> > Cc: George Dunlap <George.Dunlap@xxxxxxxxxxxxx> > Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx> > Cc: Ian Campbell <ian.campbell@xxxxxxxxxx> > Signed-off-by: Zhu Yanhai <gaoyang.zyh@xxxxxxxxxx> > --- > xen/arch/x86/traps.c | 2 +- > 1 files changed, 1 insertions(+), 1 deletions(-) > > diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c > index 77c200b..b0321a6 100644 > --- a/xen/arch/x86/traps.c > +++ b/xen/arch/x86/traps.c > @@ -3267,8 +3267,8 @@ void do_device_not_available(struct cpu_user_regs *regs) > > if ( curr->arch.pv_vcpu.ctrlreg[0] & X86_CR0_TS ) > { > + stts(); > do_guest_trap(TRAP_no_device, regs, 0); > - curr->arch.pv_vcpu.ctrlreg[0] &= ~X86_CR0_TS; > } > else > TRACE_0D(TRC_PV_MATH_STATE_RESTORE); > -- > 1.7.4.4 > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@xxxxxxxxxxxxx > http://lists.xen.org/xen-devel _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |