Re: [Xen-devel] Is: drivers/cpufreq/cpufreq-xen.c Was:Re: [PATCH 2 of 2] linux-xencommons: Load processor-passthru

>  > I think what you are suggesting is that to write a driver in 
> drivers/cpufreq/
>  > that gets either started before the other ones (if built-in) or if as
>  > a module gets
>  > loaded from xencommons. That driver would then make the call
>  > to acpi_processor_preregister_performance(),
>  > acpi_processor_register_performance() and acpi_processor_notify_smm().
>  > It would function as a cpufreq-scaling driver but
>  > in reality all calls to it from cpufreq gov-* drivers would end up with 
> nop.
>  >
>  > Dave, would you be Ok with a driver like that in your tree?
> I joined this thread half-way through, so I'm not sure what the original 
> problem was.
> How is a driver that does nothing better than just masking out the cpufreq 
> capabilities to guests ?
Hey Dave,

The problem statement is three-fold:
 1). Parse and upload ACPI0007 (or PROCESSOR_TYPE) information to the
     hypervisor - aka P-states.
 2). Upload the  Cx state information.
 3). Inhibit CPU frequency scaling drivers from loading.

The reason for wanting to solve 1) and 2) is such that the Xen hypervisor
is the only one that knows the CPU usage of different guests and can
make the proper decision of when to put CPUs and packages in proper states.
Unfortunately the hypervisor has no support to parse ACPI DSDT tables, hence it
needs help from the initial domain to provide this information. The reason
for 3) is that we do not want the initial domain to change P-states while the
hypervisor is doing it as well - it causes rather some funny cases of P-states

So in the past (old classic XenOLinux patches) there were patches
added in the drivers/acpi/processor_* to make the appropriate
hypercalls. And the CPUFREQ drivers
were not built for the xen kernels. Neither one of those is an option
for the upstream kernel.

I've been looking at how to leverage the existing wealth of
functionality that the
drivers/acpi/processor-* libs provide and trying to use that. The
first couple of versions would
harvest the data after the cpufreq scaling drivers had used and upload
them. But that
would not solve the 3) case. So then I went off in making a cpufreq
governor that would
be a nop and do 1) and 2).

The last incarnation, [see attached] instead uses the
drivers/acpi/processor_* libs to
fetch the ACPI information, calls "acpi_processor_notify_smm" to
inhibit the cpu freq scaling
drivers from being able to load. It actually works pretty well when it
is built-in, but not sure
how to make it work bullet-proof when CONFIG_X86_ACPI_CPUFREQ=m.

So my big question is whether could be a 'cpufreq.off=1' API, similar
to the "disable_cpuidle()"
call that inhibit the cpuidle drivers?

