[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] VPMU issue on Nehalem cpus



>>> On 19.11.10 at 11:29, Dietmar Hahn <dietmar.hahn@xxxxxxxxxxxxxx> wrote:
> +/*
> + * QUIRK to workaround an issue on Nehalem processors currently seen
> + * on family 6 cpus E5520 (model 26) and X7542 (model 46).
> + * The issue leads to endless NMI loops on the processor.
> + * If a counter triggers an NMI and while the NMI handler is running another
> + * counter overflows the second counter triggers endless new NMIs.
> + * A solution is to read all flagged counters and if the value is 0 write
> + * 1 into it.
> + */

Two things I don't understand here: One is that I can't see how
from the NMI handler control would get to
core2_vpmu_do_interrupt() - afaics, this gets called only in the
context of the (vectored) smp_pmu_apic_interrupt(). The other
is that if nested interrupts occur, how would you prevent this
by writing ones into zero counters? That is, in the best case I
could see this shrinking the window within which unintended
nested interrupts would occur. Or is it that the secondary
interrupts only occur after the first one returned? Is this (mis-)
behavior documented somewhere?

> +static int is_nmi_quirk;

bool_t __read_mostly?

> +
> +static void check_nmi_quirk(void)
> +{
> +    u8 family = current_cpu_data.x86;
> +    u8 cpu_model = current_cpu_data.x86_model;
> +    is_nmi_quirk = 0;
> +    if ( family == 6 )
> +    {
> +        if ( cpu_model == 46 || cpu_model == 26 )
> +            is_nmi_quirk = 1;
> +    }
> +}
> +
> +static int core2_get_pmc_count(void);
> +static void handle_nmi_quirk(u64 msr_content)
> +{
> +    int num_gen_pmc = core2_get_pmc_count();
> +    int num_fix_pmc  = 3;
> +    int i;
> +    u64 val;
> +
> +    if ( !is_nmi_quirk )
> +        return;
> +
> +    val = msr_content & ((1 << num_gen_pmc) - 1);

What's the point of masking if the subsequent loop looks at the
bottom so many bits only anyway?

> +    for ( i = 0; i < num_gen_pmc; i++ )
> +    {
> +        if ( val & 0x1 )
> +        {
> +            u64 cnt;
> +            rdmsrl(MSR_P6_PERFCTR0 + i, cnt);
> +            if ( cnt == 0 )
> +                wrmsrl(MSR_P6_PERFCTR0 + i, 1);
> +        }
> +        val >>= 1;
> +    }
> +    val = (msr_content >> 32) & ((1 << num_fix_pmc) - 1);

Same here.

> +    for ( i = 0; i < num_fix_pmc; i++ )
> +    {
> +        if ( val & 0x1 )
> +        {
> +            u64 cnt;
> +            rdmsrl(MSR_CORE_PERF_FIXED_CTR0 + i, cnt);
> +            if ( cnt == 0 )
> +                wrmsrl(MSR_CORE_PERF_FIXED_CTR0 + i, 1);
> +        }
> +        val >>= 1;
> +    }
> +}
> +
> +#define CHECK_HANDLE_NMI_QUIRK(msr_content) \
> +            if ( is_nmi_quirk )            \
> +                handle_nmi_quirk(msr_content);
> +

Why do you need a macro here if you use it only once?

>  u32 core2_counters_msr[] =   {
>      MSR_CORE_PERF_FIXED_CTR0,
>      MSR_CORE_PERF_FIXED_CTR1,
> @@ -494,6 +558,9 @@
>      rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS, msr_content);
>      if ( !msr_content )
>          return 0;
> +
> +    CHECK_HANDLE_NMI_QUIRK(msr_content)
> +
>      core2_vpmu_cxt->global_ovf_status |= msr_content;
>      msr_content = 0xC000000700000000 | ((1 << core2_get_pmc_count()) - 1);
>      wrmsrl(MSR_CORE_PERF_GLOBAL_OVF_CTRL, msr_content);

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.