[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] perf: Check all MSRs before passing hw check

On 18/03/13 10:53, Ingo Molnar wrote:
* George Dunlap <george.dunlap@xxxxxxxxxxxxx> wrote:

On 18/03/13 08:42, Ingo Molnar wrote:
* George Dunlap <george.dunlap@xxxxxxxxxxxxx> wrote:

check_hw_exists has a number of checks which go to two exit paths:
msr_fail and bios_fail.  Checks classified as msr_fail will cause
check_hw_exists() to return false, causing the PMU not to be used;
bios_fail checks will only cause a warning to be printed, but will
return true.

The problem is that if there are both msr failures and bios failures,
and the routine hits a bios_fail check first, it will exit early and
return true, not finishing the rest of the msr checks.  If those msrs
are in fact broken, it will cause them to be used erroneously.

This changset causes check_hw_exists() to go through all of the msr
checks, failing and returning false if any of them fail.

This problem affects kernels as far back as 3.2, and should thus be
considered for backport.

Signed-off-by: George Dunlap <george.dunlap@xxxxxxxxxxxxx>
CC: Konrad Wilk <konrad.wilk@xxxxxxxxxx>
CC: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
CC: "H. Peter Anvin" <hpa@xxxxxxxxx>
CC: x86@xxxxxxxxxx
  arch/x86/kernel/cpu/perf_event.c |   20 ++++++++++----------
  1 file changed, 10 insertions(+), 10 deletions(-)
What is missing is a description of what specific platform this gets
triggered on and exactly why. Is some hw feature emulation missing that
causes the check to fail?
Remember, there are two checks failing: the second one is supposed
to fail and disable the PMU entirely, but it's not getting there
because when the first one fails, it skips the rest but returns
"success" anyway.

The warning on the first check is as follows:

[    8.131985] Performance Events: Broken BIOS detected, complain to
your hardware vendor.^M
[    8.139997] [Firmware Bug]: the BIOS has corrupted hw-PMU
resources (MSR c0010000 is 530076)^M

c0010000 is the AMD  MSR_K7_EVNTSEL0; the check it's failing is:

So it discovers that one of the performance counters is already
enabled -- worth a warning, but by itself not worth disabling the
PMU.  This is most likely to be exactly what the warning message
says: a buggy BIOS that enables perfcounters enabled for some

The second check is supposed to detect that the PMU is actually not
usable -- in my case because it's running virtualized (under Xen).
I got the logic from your original description - what I wanted was for the
specific messages to be included in the patch changelog, plus a
description of what misbehaved before the patch and what behaves better
after the patch - on your specific system.

In other words, please use the customary changelog style we use in the

   " Current code does (A), this has a problem when (B).
     We can improve this doing (C), because (D)."

Right, got it. Standby for v2.


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.