|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v3 1/2][4.15] x86/PV: conditionally avoid raising #GP for early guest MSR reads
On Fri, Mar 12, 2021 at 08:54:46AM +0100, Jan Beulich wrote:
> Prior to 4.15 Linux, when running in PV mode, did not install a #GP
> handler early enough to cover for example the rdmsrl_safe() of
> MSR_K8_TSEG_ADDR in bsp_init_amd() (not to speak of the unguarded read
> of MSR_K7_HWCR later in the same function). The respective change
> (42b3a4cb5609 "x86/xen: Support early interrupts in xen pv guests") was
> backported to 4.14, but no further - presumably since it wasn't really
> easy because of other dependencies.
>
> Therefore, to prevent our change in the handling of guest MSR accesses
> to render PV Linux 4.13 and older unusable on at least AMD systems, make
> the raising of #GP on this paths conditional upon the guest having
> installed a handler, provided of course the MSR can be read in the first
> place (we would have raised #GP in that case even before). Producing
> zero for reads isn't necessarily correct and may trip code trying to
> detect presence of MSRs early, but since such detection logic won't work
> without a #GP handler anyway, this ought to be a fair workaround.
>
> Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx>
> Release-Acked-by: Ian Jackson <iwj@xxxxxxxxxxxxxx>
I think the approach is fine:
Reviewed-by: Roger Pau Monné <roger.pau@xxxxxxxxxx>
Some comments below.
> ---
> (projected v4: re-base over Roger's change)
> v3: Use temporary variable for probing. Document the behavior (in a
> public header, for the lack of a better place).
> v2: Probe MSR read. Exclude hypervisor range. Avoid issuing two log
> messages (in debug builds). Don't alter WRMSR behavior.
> ---
> While I didn't myself observe or find similar WRMSR side issues, I'm
> nevertheless not convinced we can get away without also making the WRMSR
> path somewhat more permissive again, e.g. tolerating attempts to set
> bits which are already set. But of course this would require keeping in
> sync for which MSRs we "fake" reads, as then a kernel attempt to set a
> bit may also appear as an attempt to clear others (because of the zero
> value that we gave it for the read). Roger validly points out that
> making behavior dependent upon MSR values has its own downsides, so
> simply depending on MSR readability is another option (with, in turn,
> its own undesirable effects, e.g. for write-only MSRs).
>
> --- a/xen/arch/x86/pv/emul-priv-op.c
> +++ b/xen/arch/x86/pv/emul-priv-op.c
> @@ -874,7 +874,8 @@ static int read_msr(unsigned int reg, ui
> struct vcpu *curr = current;
> const struct domain *currd = curr->domain;
> const struct cpuid_policy *cp = currd->arch.cpuid;
> - bool vpmu_msr = false;
> + bool vpmu_msr = false, warn = false;
> + uint64_t tmp;
> int ret;
>
> if ( (ret = guest_rdmsr(curr, reg, val)) != X86EMUL_UNHANDLEABLE )
> @@ -882,7 +883,7 @@ static int read_msr(unsigned int reg, ui
> if ( ret == X86EMUL_EXCEPTION )
> x86_emul_hw_exception(TRAP_gp_fault, 0, ctxt);
You might want to move the injection of the exception to the done
label?
So that we can avoid the call to x86_emul_reset_event.
>
> - return ret;
> + goto done;
> }
>
> switch ( reg )
> @@ -986,7 +987,7 @@ static int read_msr(unsigned int reg, ui
> }
> /* fall through */
> default:
> - gdprintk(XENLOG_WARNING, "RDMSR 0x%08x unimplemented\n", reg);
> + warn = true;
> break;
>
> normal:
> @@ -995,7 +996,19 @@ static int read_msr(unsigned int reg, ui
> return X86EMUL_OKAY;
> }
>
> - return X86EMUL_UNHANDLEABLE;
> + done:
> + if ( ret != X86EMUL_OKAY && !curr->arch.pv.trap_ctxt[X86_EXC_GP].address
> &&
> + (reg >> 16) != 0x4000 && !rdmsr_safe(reg, tmp) )
> + {
> + gprintk(XENLOG_WARNING, "faking RDMSR 0x%08x\n", reg);
> + *val = 0;
> + x86_emul_reset_event(ctxt);
> + ret = X86EMUL_OKAY;
> + }
> + else if ( warn )
> + gdprintk(XENLOG_WARNING, "RDMSR 0x%08x unimplemented\n", reg);
I think you could add:
if ( rc == X86EMUL_EXCEPTION )
x86_emul_hw_exception(TRAP_gp_fault, 0, ctxt);
> +
> + return ret;
> }
>
> static int write_msr(unsigned int reg, uint64_t val,
> --- a/xen/include/public/arch-x86/xen.h
> +++ b/xen/include/public/arch-x86/xen.h
> @@ -143,6 +143,12 @@ typedef unsigned long xen_ulong_t;
> * Level == 1: Kernel may enter
> * Level == 2: Kernel may enter
> * Level == 3: Everyone may enter
> + *
> + * Note: For compatibility with kernels not setting up exception handlers
> + * early enough, Xen will avoid trying to inject #GP (and hence crash
> + * the domain) when an RDMSR would require this, but no handler was
> + * set yet. The precise conditions are implementation specific, and
You can drop the 'yet' here I think? As even if a handler has been set
and then removed we would still prevent injecting a #GP AFAICT. Not a
native speaker anyway, so I might be wrong on that one.
> + * new code shouldn't rely on such behavior anyway.
I would use a stronger mustn't here instead of shouldn't.
Thanks, Roger.
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |