|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v2 1/2][4.15] x86/PV: conditionally avoid raising #GP for early guest MSR reads
On 08.03.2021 09:56, Roger Pau Monné wrote:
> On Fri, Mar 05, 2021 at 10:50:34AM +0100, Jan Beulich wrote:
>> Prior to 4.15 Linux, when running in PV mode, did not install a #GP
>> handler early enough to cover for example the rdmsrl_safe() of
>> MSR_K8_TSEG_ADDR in bsp_init_amd() (not to speak of the unguarded read
>> of MSR_K7_HWCR later in the same function). The respective change
>> (42b3a4cb5609 "x86/xen: Support early interrupts in xen pv guests") was
>> backported to 4.14, but no further - presumably since it wasn't really
>> easy because of other dependencies.
>>
>> Therefore, to prevent our change in the handling of guest MSR accesses
>> to render PV Linux 4.13 and older unusable on at least AMD systems, make
>> the raising of #GP on this paths conditional upon the guest having
>> installed a handler, provided of course the MSR can be read in the first
>> place (we would have raised #GP in that case even before). Producing
>> zero for reads isn't necessarily correct and may trip code trying to
>> detect presence of MSRs early, but since such detection logic won't work
>> without a #GP handler anyway, this ought to be a fair workaround.
>>
>> Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx>
>> ---
>> v2: Probe MSR read. Exclude hypervisor range. Avoid issuing two log
>> messages (in debug builds). Don't alter WRMSR behavior.
>> ---
>> I'm not convinced we can get away without also making the WRMSR path
>> somewhat more permissive again, e.g. tolerating attempts to set bits
>> which are already set. But of course this would require keeping in sync
>> for which MSRs we "fake" reads, as then a kernel attempt to set a bit
>> may also appear as an attempt to clear others (because of the zero value
>> that we gave it for the read).
>
> The above approach seems dangerous, as it could allow a guest to
> figure out the value of the underlying MSR by probing whether values
> trigger a #GP?
Perhaps, yes. But what do you do? There's potentially a huge value
range to probe ...
> I think we want to do something similar to what we do on HVM in 4.14
> and previous versions: ignore writes as long as the rdmsr to the
> target MSR succeeds, regardless of the value.
Which, as said elsewhere, has its own downsides - writable MSRs don't
need to also be readable. See e.g. AMD's proposed PARTIAL_{FS,GS}_LOAD
MSRs.
>> --- a/xen/arch/x86/pv/emul-priv-op.c
>> +++ b/xen/arch/x86/pv/emul-priv-op.c
>> @@ -874,7 +874,7 @@ static int read_msr(unsigned int reg, ui
>> struct vcpu *curr = current;
>> const struct domain *currd = curr->domain;
>> const struct cpuid_policy *cp = currd->arch.cpuid;
>> - bool vpmu_msr = false;
>> + bool vpmu_msr = false, warn = false;
>> int ret;
>>
>> if ( (ret = guest_rdmsr(curr, reg, val)) != X86EMUL_UNHANDLEABLE )
>> @@ -882,7 +882,7 @@ static int read_msr(unsigned int reg, ui
>> if ( ret == X86EMUL_EXCEPTION )
>> x86_emul_hw_exception(TRAP_gp_fault, 0, ctxt);
>>
>> - return ret;
>> + goto done;
>> }
>>
>> switch ( reg )
>> @@ -986,7 +986,7 @@ static int read_msr(unsigned int reg, ui
>> }
>> /* fall through */
>> default:
>> - gdprintk(XENLOG_WARNING, "RDMSR 0x%08x unimplemented\n", reg);
>> + warn = true;
>> break;
>>
>> normal:
>> @@ -995,7 +995,19 @@ static int read_msr(unsigned int reg, ui
>> return X86EMUL_OKAY;
>> }
>>
>> - return X86EMUL_UNHANDLEABLE;
>> + done:
>
> Won't this handling be better placed in the 'default' switch case
> above?
No - see the "goto done" added near the top of the function.
>> + if ( ret != X86EMUL_OKAY &&
>> !curr->arch.pv.trap_ctxt[X86_EXC_GP].address &&
>> + (reg >> 16) != 0x4000 && !rdmsr_safe(reg, *val) )
>
> We didn't used to care about explicitly blocking the reserved MSR
> range, do we really wan to do it now?
>
> I'm not sure I see an issue with that, but given that we are trying to
> bring back something similar to the previous behavior, I would avoid
> adding new conditions.
What I'm particularly trying to avoid here is to allow
information from an underlying hypervisor to "shine through",
even if it's only information as to whether a certain MSR is
readable. It should be solely our own selection which MSRs in
this range a guest is able to (appear to) read.
Plus of course by avoiding the rdmsr_safe() in this case we
also avoid the unnecessary (debug only) log message associated
with such attempted reads.
Jan
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |