[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v4 14/26] x86/cpu: Rework AMD masking MSR setup



On 28/03/16 19:55, Konrad Rzeszutek Wilk wrote:
> On Wed, Mar 23, 2016 at 04:36:17PM +0000, Andrew Cooper wrote:
>> This patch is best reviewed as its end result rather than as a diff, as it
>> rewrites almost all of the setup.
>>
>> On the BSP, cpuid information is used to evaluate the potential available set
>> of masking MSRs, and they are unconditionally probed, filling in the
>> availability information and hardware defaults.
>>
>> The command line parameters are then combined with the hardware defaults to
>> further restrict the Xen default masking level.  Each cpu is then context
>> switched into the default levelling state.
> Context switched? Why not just say:
>
> When booting up CPUs we set the same MSR mask for each CPU.
>
> The amd_ctxt_switch_levelling can be also used (in patch XYZ) to swap
> levelling per guest granularity.
>
>> Signed-off-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
>> Reviewed-by: Jan Beulich <JBeulich@xxxxxxxx>
>> ---
>> v2:
>>  * Provide extra information if opt_cpu_info
>>  * Extra comment indicating the expected use of amd_ctxt_switch_levelling()
>> v3:
>>  * Fix the interaction of the fast-forward bits with the override MSRs.
>>  * Style fixups.
>> ---
>>  xen/arch/x86/cpu/amd.c | 276 
>> ++++++++++++++++++++++++++++++++-----------------
>>  1 file changed, 179 insertions(+), 97 deletions(-)
>>
>> diff --git a/xen/arch/x86/cpu/amd.c b/xen/arch/x86/cpu/amd.c
>> index 5516777..0e1c8b9 100644
>> --- a/xen/arch/x86/cpu/amd.c
>> +++ b/xen/arch/x86/cpu/amd.c
>> @@ -80,6 +80,13 @@ static inline int wrmsr_amd_safe(unsigned int msr, 
>> unsigned int lo,
>>      return err;
>>  }
>>  
>> +static void wrmsr_amd(unsigned int msr, uint64_t val)
>> +{
>> +    asm volatile("wrmsr" ::
>> +                 "c" (msr), "a" ((uint32_t)val),
>> +                 "d" (val >> 32), "D" (0x9c5a203a));
>> +}
>> +
>>  static const struct cpuidmask {
>>      uint16_t fam;
>>      char rev[2];
>> @@ -126,126 +133,198 @@ static const struct cpuidmask *__init noinline 
>> get_cpuidmask(const char *opt)
>>  }
>>  
>>  /*
>> + * Sets caps in expected_levelling_cap, probes for the specified mask MSR, 
>> and
>> + * set caps in levelling_caps if it is found.  Processors prior to Fam 10h
>> + * required a 32-bit password for masking MSRs.  Returns the default value.
>> + */
>> +static uint64_t __init _probe_mask_msr(unsigned int msr, uint64_t caps)
>> +{
>> +    unsigned int hi, lo;
>> +
>> +    expected_levelling_cap |= caps;
>> +
>> +    if ((rdmsr_amd_safe(msr, &lo, &hi) == 0) &&
>> +        (wrmsr_amd_safe(msr, lo, hi) == 0))
>> +            levelling_caps |= caps;
>> +
>> +    return ((uint64_t)hi << 32) | lo;
>> +}
>> +
>> +/*
>> + * Probe for the existance of the expected masking MSRs.  They might easily
>> + * not be available if Xen is running virtualised.
>> + */
>> +static void __init noinline probe_masking_msrs(void)
> Why noninline?

So this large quantity of __init code doesn't get inlined into its sole
caller, which is not __init.

>
>> +{
>> +    const struct cpuinfo_x86 *c = &boot_cpu_data;
>> +
>> +    /*
>> +     * First, work out which masking MSRs we should have, based on
>> +     * revision and cpuid.
>> +     */
>> +
>> +    /* Fam11 doesn't support masking at all. */
>> +    if (c->x86 == 0x11)
>> +            return;
>> +
>> +    cpuidmask_defaults._1cd =
>> +            _probe_mask_msr(MSR_K8_FEATURE_MASK, LCAP_1cd);
>> +    cpuidmask_defaults.e1cd =
>> +            _probe_mask_msr(MSR_K8_EXT_FEATURE_MASK, LCAP_e1cd);
>> +
>> +    if (c->cpuid_level >= 7)
>> +            cpuidmask_defaults._7ab0 =
>> +                    _probe_mask_msr(MSR_AMD_L7S0_FEATURE_MASK, LCAP_7ab0);
>> +
>> +    if (c->x86 == 0x15 && c->cpuid_level >= 6 && cpuid_ecx(6))
>> +            cpuidmask_defaults._6c =
>> +                    _probe_mask_msr(MSR_AMD_THRM_FEATURE_MASK, LCAP_6c);
>> +
>> +    /*
>> +     * Don't bother warning about a mismatch if virtualised.  These MSRs
>> +     * are not architectural and almost never virtualised.
>> +     */
>> +    if ((expected_levelling_cap == levelling_caps) ||
>> +        cpu_has_hypervisor)
>> +            return;
>> +
>> +    printk(XENLOG_WARNING "Mismatch between expected (%#x) "
>> +           "and real (%#x) levelling caps: missing %#x\n",
>> +           expected_levelling_cap, levelling_caps,
>> +           (expected_levelling_cap ^ levelling_caps) & levelling_caps);
>> +    printk(XENLOG_WARNING "Fam %#x, model %#x level %#x\n",
>> +           c->x86, c->x86_model, c->cpuid_level);
>> +    printk(XENLOG_WARNING
>> +           "If not running virtualised, please report a bug\n");
> You already have an cpu_has_hypervisor check above? Or is that for
> the hypervisor which do not set that bit?

Correct.

>
>> +}
>> +
>> +/*
>> + * Context switch levelling state to the next domain.  A parameter of NULL 
>> is
>> + * used to context switch to the default host state, and is used by the 
>> BSP/AP
>> + * startup code.
> OK, how about:
>> + */
>> +static void amd_ctxt_switch_levelling(const struct domain *nextd)
>> +{
>> +    struct cpuidmasks *these_masks = &this_cpu(cpuidmasks);
>> +    const struct cpuidmasks *masks = &cpuidmask_defaults;
>> +
>       ASSERT(!d && system_state != SYS_STATE_active); ?

Because context switching back to NULL is in other situations, such as
the crash path.

>> +                    ecx &= opt_cpuid_mask_ecx;
>> +                    edx &= opt_cpuid_mask_edx;
>> +            } else if (m) {
>> +                    ecx &= m->ecx;
>> +                    edx &= m->edx;
>>              }
>> -            feat_ecx = m->ecx;
>> -            feat_edx = m->edx;
>> -            extfeat_ecx = m->ext_ecx;
>> -            extfeat_edx = m->ext_edx;
>> +
>> +            /* Fast-forward bits - Must be set. */
>> +            if (ecx & cpufeat_mask(X86_FEATURE_XSAVE))
>> +                    ecx |= cpufeat_mask(X86_FEATURE_OSXSAVE);
>> +            edx |= cpufeat_mask(X86_FEATURE_APIC);
>> +
>> +            /* Allow the HYPERVISOR bit to be set via guest policy. */
>> +            ecx |= cpufeat_mask(X86_FEATURE_HYPERVISOR);
> Hmm. The http://support.amd.com/TechDocs/52740_16h_Models_30h-3Fh_BKDG.pdf
> pg 624 mentions this (bit 63) as 'Reserved'. Should we really set it?
> Ah, but then earlier (pg 530) it says 'Reserved for use by hypervisor to 
> indicate
> guest status.
>
>
>> +
>> +            cpuidmask_defaults._1cd = ((uint64_t)ecx << 32) | edx;
> Considering the document mentions Reserved should we preserve the bits that
> are set by the initial call that fills out the cpuidmask_default?

We already do.  Observe that the defaults will be reflected by the
cpuid() call.

>
> The original code also had:
>>      }
>>  
>> -        /* Setting bits in the CPUID mask MSR that are not set in the
>> -         * unmasked CPUID response can cause those bits to be set in the
>> -         * masked response.  Avoid that by explicitly masking in software. 
>> */
>  that comment in it. Would it make sense to include it (or a rework of it 
> since
> I wasn't exactly sure what it was saying).

These MSRs are overrides, which in practice allow you to advertise
features which are not actually supported by hardware.  This is bad, as
the attempting to use the feature will still cause a #UD fault.

>
>> +                    ecx &= opt_cpuid_mask_ext_ecx;
>> +                    edx &= opt_cpuid_mask_ext_edx;
>> +            } else if (m) {
>> +                    ecx &= m->ext_ecx;
>> +                    edx &= m->ext_edx;
>> +            }
>> +
>> +            /* Fast-forward bits - Must be set. */
>> +            edx |= cpufeat_mask(X86_FEATURE_APIC);
>> +
>> +            cpuidmask_defaults.e1cd = ((uint64_t)ecx << 32) | edx;
> Should this be &= ?

No, because that would prevent correct handling of the fast forward bits.

>> +                    eax &= opt_cpuid_mask_l7s0_eax;
>> +                    ebx &= opt_cpuid_mask_l7s0_ebx;
>> +            }
>> +
>> +            cpuidmask_defaults._7ab0 &= ((uint64_t)eax << 32) | ebx;
>>      }
>>  
>> -    if (!skip_l7s0_eax_ebx &&
>> -        wrmsr_amd_safe(MSR_AMD_L7S0_FEATURE_MASK, l7s0_ebx, l7s0_eax)) {
>> -            skip_l7s0_eax_ebx = 1;
>> -            printk("Failed to set CPUID leaf 7 subleaf 0 feature mask\n");
>> +    if ((levelling_caps & LCAP_6c) == LCAP_6c) {
>> +            uint32_t ecx = cpuid_ecx(6);
>> +
>> +            if (~opt_cpuid_mask_thermal_ecx)
>> +                    ecx &= opt_cpuid_mask_thermal_ecx;
>> +
>> +            cpuidmask_defaults._6c &= (~0ULL << 32) | ecx;
>
> Is there any documentation about this? The BKDG from 03/2016 does not mention
> this MSR (C001_1003). Ah but it is mentioned in docs for Family 15th. How 
> nice.

The documentation in this regard is remarkably poor.

>
>>      }.
>>  
>> -    if (!skip_thermal_ecx &&
>> -        (rdmsr_amd_safe(MSR_AMD_THRM_FEATURE_MASK, &eax, &edx) ||
>> -         wrmsr_amd_safe(MSR_AMD_THRM_FEATURE_MASK, thermal_ecx, edx))){
>> -            skip_thermal_ecx = 1;
>> -            printk("Failed to set CPUID thermal/power feature mask\n");
>> +    if (opt_cpu_info) {
>> +            printk(XENLOG_INFO "Levelling caps: %#x\n", levelling_caps);
>> +            printk(XENLOG_INFO
>> +                   "MSR defaults: 1d 0x%08x, 1c 0x%08x, e1d 0x%08x, "
>> +                   "e1c 0x%08x, 7a0 0x%08x, 7b0 0x%08x, 6c 0x%08x\n",
>> +                   (uint32_t)cpuidmask_defaults._1cd,
>> +                   (uint32_t)(cpuidmask_defaults._1cd >> 32),
>> +                   (uint32_t)cpuidmask_defaults.e1cd,
>> +                   (uint32_t)(cpuidmask_defaults.e1cd >> 32),
>> +                   (uint32_t)(cpuidmask_defaults._7ab0 >> 32),
>> +                   (uint32_t)cpuidmask_defaults._7ab0,
>> +                   (uint32_t)cpuidmask_defaults._6c);
> Why don't you bit shift cpuidmask_defaults._6c too?

Because only the bottom 32bit are relevant.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.