|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH] RFC: Linux: disable APERF/MPERF feature in PV kernels
On Tue, May 22, 2012 at 11:02:01PM +0200, Andre Przywara wrote:
> On 05/22/2012 07:18 PM, Konrad Rzeszutek Wilk wrote:
> >On Tue, May 22, 2012 at 06:07:11PM +0200, Andre Przywara wrote:
> >>Hi,
> >>
> >>while testing some APERF/MPERF semantics I discovered that this
> >>feature is enabled in Xen Dom0, but is not reliable.
> >>The Linux kernel's scheduler uses this feature if it sees the CPUID
> >>bit, leading to costly RDMSR traps (a few 100,000s during a kernel
> >>compile) and bogus values due to VCPU migration during the
> >
> >Can you point me to the Linux scheduler code that does this? Thanks.
>
> arch/x86/kernel/cpu/sched.c contains code to read out and compute
> APERF/MPERF registers. I added a Xen debug-key to dump a usage
> counter added in traps.c and thus could prove that it is actually
> the kernel that accesses these registers.
> As far as I understood this the idea is to learn about boosting and
> down-clocking (P-states) to get a fairer view on the actual
> computing time a process consumed.
Looks like its looking for this:
X86_FEATURE_APERFMPERF
Perhaps masking that should do it? Something along this in enlighten.c:
cpuid_leaf1_edx_mask =
~((1 << X86_FEATURE_MCE) | /* disable MCE */
(1 << X86_FEATURE_MCA) | /* disable MCA */
(1 << X86_FEATURE_MTRR) | /* disable MTRR */
(1 << X86_FEATURE_ACC)); /* thermal monitoring
would be more appropiate?
Or is that attribute on a different leaf?
>
> >>measurement.
> >>The attached patch explicitly disables this CPU capability inside
> >>the Linux kernel, I couldn't measure any APERF/MPERF reads anymore
> >>with the patch applied.
> >>I am not sure if the PVOPS code is the right place to fix this, we
> >>could as well do it in the HV's xen/arch/x86/traps.c:pv_cpuid().
> >>Also when the Dom0 VCPUs are pinned, we could allow this, but I am
> >>not sure if it's worth to do so.
> >>
> >>Awaiting your comments.
> >>
> >>Regards,
> >>Andre.
> >>
> >>P.S. Of course this doesn't fix pure userland software like
> >>cpupower, but I would consider this in the user's responsibility to
> >
> >Which would not work anymore as the cpufreq support is disabled
> >when it boots under Xen.
>
> Do you mean with "anymore" in a future kernel? I tested this on
> 3.4.0 and cpupower monitor worked fine. Right, cpufreq is not
> enabled, but cpupower uses the /dev/cpu/<n>/msr device file to
> directly read the MSRs. So I get this output if run on an idle Dom0:
Ahh. Neat. Will have to play with that.
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |