Xen project Mailing List

Re: [Xen-devel] Failure to boot default Debian wheezy (pvops) kernel on 4.2-rc2

To: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>, Ian Campbell <Ian.Campbell@xxxxxxxxxx>

From: George Dunlap <George.Dunlap@xxxxxxxxxxxxx>

Date: Thu, 14 Mar 2013 15:44:37 +0000

Cc: "xen-devel@xxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxx>, Stefano Stabellini <Stefano.Stabellini@xxxxxxxxxxxxx>

Delivery-date: Thu, 14 Mar 2013 15:45:06 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

On Thu, Mar 14, 2013 at 10:07 AM, George Dunlap <George.Dunlap@xxxxxxxxxxxxx> wrote: > On Fri, Aug 17, 2012 at 3:00 PM, Konrad Rzeszutek Wilk > <konrad.wilk@xxxxxxxxxx> wrote: >> So that translates to MSR_K7_EVNTSEL0. >> >> And that should only been shown once. Is the perf trying to load >> the module over and over? > > So I've just tested this again with the latest wheezy kernel (3.2.0-4) > but this time taking a closer look, I see this near the first > instance: > > > [ 0.072397] Performance Events: Broken BIOS detected, complain to > your hardware vendor.^M > [ 0.076015] [Firmware Bug]: the BIOS has corrupted hw-PMU resources > (MSR c0010000 is 530076)^M > [ 0.080007] AMD PMU driver.^M > [ 0.082861] ------------[ cut here ]------------^M > [ 0.084019] WARNING: at > /build/buildd-linux_3.2.39-2-i386-4VFKqr/linux-3.2.39/arch/x86/xen/enlighten.c:738 > perf_events_lapic_init+0x28/0x29()^M > [ 0.088009] Hardware name: empty^M > [ 0.091294] Modules linked in:^M > [ 0.092268] Pid: 1, comm: swapper/0 Not tainted 3.2.0-4-686-pae #1 > Debian 3.2.39-2^M > [ 0.096009] Call Trace:^M > [ 0.098531] [<c10383c4>] ? warn_slowpath_common+0x68/0x79^M > [ 0.100011] [<c101536e>] ? perf_events_lapic_init+0x28/0x29^M > [ 0.104012] [<c10383e2>] ? warn_slowpath_null+0xd/0x10^M > [ 0.108011] [<c101536e>] ? perf_events_lapic_init+0x28/0x29^M > [ 0.112016] [<c1421ac7>] ? init_hw_perf_events+0x223/0x3b1^M > [ 0.116012] [<c14218a4>] ? check_bugs+0x1d9/0x1d9^M > [ 0.120014] [<c1003074>] ? do_one_initcall+0x66/0x10e^M > [ 0.124012] [<c141a781>] ? kernel_init+0x79/0x131^M > [ 0.128012] [<c141a708>] ? start_kernel+0x32a/0x32a^M > [ 0.132013] [<c12c727e>] ? kernel_thread_helper+0x6/0x10^M > [ 0.136020] ---[ end trace b828488e55b27a3e ]---^M > [ 0.140015] ... version: 0^M > [ 0.144011] ... bit width: 48^M > [ 0.148012] ... generic registers: 4^M > [ 0.152011] ... value mask: 0000ffffffffffff^M > [ 0.156013] ... max period: 00007fffffffffff^M > [ 0.160012] ... fixed-purpose events: 0^M > [ 0.164013] ... event mask: 000000000000000f^M > [ 0.168276] NMI watchdog enabled, takes one hw-pmu counter.^M > (XEN) traps.c:2495:d0 Domain attempted WRMSR 00000000c0010004 from > 0x0000ffff9af0c3ec to 0x0000fffb5adce6f0. > > So relating this back to the discussion about vpmu for guests, it > looks like maybe it's testing the performance counters, detecting that > they're broken, but for some reason not actually disabling the NMI > watchdog, and keeps on using them? I'm guessing that the problem is in arch/x86/kernel/cpu/perf_events.c:check_hw_exits(). It has two failures modes -- "bios_fail" and "msr_fail". It does that check where it tries to write and then read the perfcounter MSRs to see if they're functional; if that fails it will go to msr_fail and return false. However, *before* it does that check, it does some other checks which, if they fail, will jump right to bios_fail, missing that check out entirely. Really the "goto bios_fail" is wrong in all sorts of ways -- e.g., in the first loop, if it detects that condition early on, it will entirely miss other MSR checks. I might just propose a complete rewrite of that function... -George _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.