[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Failure to boot default Debian wheezy (pvops) kernel on 4.2-rc2



On Thu, Mar 14, 2013 at 10:07 AM, George Dunlap
<George.Dunlap@xxxxxxxxxxxxx> wrote:
> On Fri, Aug 17, 2012 at 3:00 PM, Konrad Rzeszutek Wilk
> <konrad.wilk@xxxxxxxxxx> wrote:
>> So that translates to MSR_K7_EVNTSEL0.
>>
>> And that should only been shown once. Is the perf trying to load
>> the module over and over?
>
> So I've just tested this again with the latest wheezy kernel (3.2.0-4)
> but this time taking a closer look, I see this near the first
> instance:
>
>
> [    0.072397] Performance Events: Broken BIOS detected, complain to
> your hardware vendor.^M
> [    0.076015] [Firmware Bug]: the BIOS has corrupted hw-PMU resources
> (MSR c0010000 is 530076)^M
> [    0.080007] AMD PMU driver.^M
> [    0.082861] ------------[ cut here ]------------^M
> [    0.084019] WARNING: at
> /build/buildd-linux_3.2.39-2-i386-4VFKqr/linux-3.2.39/arch/x86/xen/enlighten.c:738
> perf_events_lapic_init+0x28/0x29()^M
> [    0.088009] Hardware name: empty^M
> [    0.091294] Modules linked in:^M
> [    0.092268] Pid: 1, comm: swapper/0 Not tainted 3.2.0-4-686-pae #1
> Debian 3.2.39-2^M
> [    0.096009] Call Trace:^M
> [    0.098531]  [<c10383c4>] ? warn_slowpath_common+0x68/0x79^M
> [    0.100011]  [<c101536e>] ? perf_events_lapic_init+0x28/0x29^M
> [    0.104012]  [<c10383e2>] ? warn_slowpath_null+0xd/0x10^M
> [    0.108011]  [<c101536e>] ? perf_events_lapic_init+0x28/0x29^M
> [    0.112016]  [<c1421ac7>] ? init_hw_perf_events+0x223/0x3b1^M
> [    0.116012]  [<c14218a4>] ? check_bugs+0x1d9/0x1d9^M
> [    0.120014]  [<c1003074>] ? do_one_initcall+0x66/0x10e^M
> [    0.124012]  [<c141a781>] ? kernel_init+0x79/0x131^M
> [    0.128012]  [<c141a708>] ? start_kernel+0x32a/0x32a^M
> [    0.132013]  [<c12c727e>] ? kernel_thread_helper+0x6/0x10^M
> [    0.136020] ---[ end trace b828488e55b27a3e ]---^M
> [    0.140015] ... version:                0^M
> [    0.144011] ... bit width:              48^M
> [    0.148012] ... generic registers:      4^M
> [    0.152011] ... value mask:             0000ffffffffffff^M
> [    0.156013] ... max period:             00007fffffffffff^M
> [    0.160012] ... fixed-purpose events:   0^M
> [    0.164013] ... event mask:             000000000000000f^M
> [    0.168276] NMI watchdog enabled, takes one hw-pmu counter.^M
> (XEN) traps.c:2495:d0 Domain attempted WRMSR 00000000c0010004 from
> 0x0000ffff9af0c3ec to 0x0000fffb5adce6f0.
>
> So relating this back to the discussion about vpmu for guests, it
> looks like maybe it's testing the performance counters, detecting that
> they're broken, but for some reason not actually disabling the NMI
> watchdog, and keeps on using them?

I'm guessing that the problem is in
arch/x86/kernel/cpu/perf_events.c:check_hw_exits().  It has two
failures modes -- "bios_fail" and "msr_fail".  It does that check
where it tries to write and then read the perfcounter MSRs to see if
they're functional; if that fails it will go to msr_fail and return
false.  However, *before* it does that check, it does some other
checks which, if they fail, will jump right to bios_fail, missing that
check out entirely.

Really the "goto bios_fail" is wrong in all sorts of ways -- e.g., in
the first loop, if it detects that condition early on, it will
entirely miss other MSR checks.  I might just propose a complete
rewrite of that function...

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.