[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 4/5] x86/pv: Remove deferred RDTSC{, P} handling in pv_emulate_privileged_op()



On Tue, Feb 20, 2018 at 04:37:51PM +0000, Andrew Cooper wrote:
> On 20/02/18 16:28, Roger Pau Monné wrote:
> > On Tue, Feb 20, 2018 at 11:58:42AM +0000, Andrew Cooper wrote:
> >> The handling of RDTSCP for PV guests has been broken (AFAICT forever).
> >>
> >> To start with, RDTSCP is hidden from PV guests so the MSR_TSC_AUX path 
> >> should
> >> be unreachable.  However, this appears to be a "feature" of 
> >> TSC_MODE_PVRDTSCP,
> >> and the emulator doesn't perform appropriate feature checking.  
> >> (Conversely,
> >> we unilaterally advertise RDPID which uses the same path, but it should 
> >> never
> >> trap on #GP to arrive here in the first place).
> >>
> >> A PV guest typically can see RDTSCP in native CPUID, so userspace will
> >> probably end up using it.  On a capable pipeline (without TSD, see below), 
> >> it
> >> will execute normally and return non-virtualised data.
> >>
> >> When a virtual TSC mode is not specified for the domain, CR4.TSD is left
> >> clear, so executing RDTSCP will execute without trapping.  However, a guest
> >> kernel may set TSD itself, at which point the emulator should not suddenly
> >> switch to virtualised TSC mode and start handing out differently-scaled
> >> values.
> >>
> >> Drop all the deferral logic, and return scaled or raw TSC values depending
> >> only on currd->arch.vtsc.  This changes the exact moment at which the
> >> timestamp is taken, but that doesn't matter from the guests point of view, 
> >> and
> >> is consistent with the HVM side of things.  It also means that RDTSC and
> >> RDTSCP are now consistent WRT handing out native or virtualised timestamps.
> >>
> >> The MSR_TSC_AUX case unconditionally returns the migration incarnation or
> >> zero, depending on TSC_MODE_PVRDTSCP, which is faster than re-reading it 
> >> out
> >> of hardware.
> >>
> >> This is a behavioural change for guests, but the semantics are rather more
> >> sane.  It lays groundwork for further fixes.
> >>
> >> Signed-off-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
> >> ---
> >> CC: Jan Beulich <JBeulich@xxxxxxxx>
> >> CC: Wei Liu <wei.liu2@xxxxxxxxxx>
> >> CC: Roger Pau Monné <roger.pau@xxxxxxxxxx>
> >> CC: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>
> >> ---
> >>  xen/arch/x86/pv/emul-priv-op.c | 35 +++++------------------------------
> >>  1 file changed, 5 insertions(+), 30 deletions(-)
> >>
> >> diff --git a/xen/arch/x86/pv/emul-priv-op.c 
> >> b/xen/arch/x86/pv/emul-priv-op.c
> >> index d4d64f2..4e3641d 100644
> >> --- a/xen/arch/x86/pv/emul-priv-op.c
> >> +++ b/xen/arch/x86/pv/emul-priv-op.c
> >> @@ -60,9 +60,6 @@ struct priv_op_ctxt {
> >>      } cs;
> >>      char *io_emul_stub;
> >>      unsigned int bpmatch;
> >> -    unsigned int tsc;
> >> -#define TSC_BASE 1
> >> -#define TSC_AUX 2
> >>  };
> >>  
> >>  /* I/O emulation support. Helper routines for, and type of, the stack 
> >> stub. */
> >> @@ -843,8 +840,7 @@ static inline bool is_cpufreq_controller(const struct 
> >> domain *d)
> >>  static int read_msr(unsigned int reg, uint64_t *val,
> >>                      struct x86_emulate_ctxt *ctxt)
> >>  {
> >> -    struct priv_op_ctxt *poc = container_of(ctxt, struct priv_op_ctxt, 
> >> ctxt);
> >> -    const struct vcpu *curr = current;
> >> +    struct vcpu *curr = current;
> > I think you can keep the const here?
> 
> pv_soft_rdtsc() mutates curr.  This change is most certainly not spurious.

Oh right, you constified regs but not the vcpu parameter. AFAICT the
vcpu parameter of pv_soft_rdtsc could also be constified? It's only
used to get the domain and whether the guests is in kernel or user
mode.

> >>      const struct domain *currd = curr->domain;
> >>      bool vpmu_msr = false;
> >>      int ret;
> >> @@ -880,20 +876,13 @@ static int read_msr(unsigned int reg, uint64_t *val,
> >>          *val = curr->arch.pv_vcpu.gs_base_user;
> >>          return X86EMUL_OKAY;
> >>  
> >> -    /*
> >> -     * In order to fully retain original behavior, defer calling
> >> -     * pv_soft_rdtsc() until after emulation. This may want/need to be
> >> -     * reconsidered.
> >> -     */
> >>      case MSR_IA32_TSC:
> >> -        poc->tsc |= TSC_BASE;
> >> -        goto normal;
> >> +        *val = currd->arch.vtsc ? pv_soft_rdtsc(curr, ctxt->regs) : 
> >> rdtsc();
> >> +        return X86EMUL_OKAY;
> >>  
> >>      case MSR_TSC_AUX:
> >> -        poc->tsc |= TSC_AUX;
> >> -        if ( cpu_has_rdtscp )
> >> -            goto normal;
> >> -        *val = 0;
> >> +        *val = (uint32_t)((currd->arch.tsc_mode == TSC_MODE_PVRDTSCP)
> >> +                          ? currd->arch.incarnation : 0);
> > I wonder whether Xen should inject a #GP here if tsc_mode is not
> > PVRDTSCP and RDPID is not available.
> 
> RDPID would be a layering violation.  Also, a lot of this changes again
> in patch 5.

I see, patch 5 makes this all much better. And this is not any worse
that the previous behavior:

Reviewed-by: Roger Pau Monné <roger.pau@xxxxxxxxxx>

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.