Xen project Mailing List

Re: [Xen-devel] [PATCH 2 of 2] linux-xencommons: Load processor-passthru

>>> Furthermore, given the purpose of the driver, I'm thinking that this >>> is too late in the boot process anyway, and the driver should rather >>> be loaded at the point where other CPUFreq drivers would normally >>> be by a particular distro (which would still be later than when the In Ubuntu and Fedora they are set to: CONFIG_X86_POWERNOW_K8=y CONFIG_X86_ACPI_CPUFREQ=y (thought previous to Fedora 16 POWERNOW_K8 was =m, per https://bugzilla.redhat.com/show_bug.cgi?id=739159, " powernow-k8 transitions fail in xen dom0") Looking at how the built is done with =y, the cpufreq drivers are loaded in the late_init stage and there is a strict order (drivers/cpufreq/Makefile): # x86 drivers. # Link order matters. K8 is preferred to ACPI because of firmware bugs in early # K8 systems. ACPI is preferred to all other hardware-specific drivers. # speedstep-* is preferred over p4-clockmod. Both of them (acpi-cpufreq.c and powernow-k8.c) have a symbol dependency on drivers/acpi/processor.c >>> respective code gets run on traditional Xenified Linux). >> >> My thinking is to keep the amount of changes to be within the Xen-ecosystem. .. snip.. >> The fix to the tool-stack would be something along: .. snip.. >> diff -r 917f845f3e83 tools/hotplug/Linux/init.d/xencommons >> --- a/tools/hotplug/Linux/init.d/xencommons Thu Feb 23 13:29:00 2012 -0500 >> +++ b/tools/hotplug/Linux/init.d/xencommons Fri Feb 24 21:42:20 2012 -0500 >> @@ -58,7 +58,7 @@ do_start () { >> modprobe xen-gntdev 2>/dev/null >> modprobe evtchn 2>/dev/null >> modprobe gntdev 2>/dev/null >> - >> + for file in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor; do >> echo xen > $file; done 2>/dev/null > > While not an unreasonable idea (yes, I sort of contradict my other mail > from a few minutes ago, given your explanation above), this won't work > with subsequently onlined vCPU-s (and hence, consistent with my other This udev rule could do it? [completely untested] ACTION=="add", SUBSYSTEM=="cpu", RUN+="/bin/sh -c '[ ! -e /sys$devpath/online ] || echo xen > /sys$devpath/cpufreq/scaling_governor'" > mail, having the thing be a governor is likely the wrong choice - it ought > to be a driver). > By driver, I think you mean either a) a new cpufreq scaling driver, or b) the processor-core driver. The b) is what the old Xen-O-Linux tree had, and a simplified version of this had been posted on LKML: https://lkml.org/lkml/2011/11/30/245 . There was no response from Len Brown about it and from the off-topic email conversations I got the hint that Len wasn't particularly thrilled about it. For a), this would mean some form of unregistering the existing cpufreq scaling drivers.The reason for that is we want to use the generic ones (acpi-cpufreq and powernow-k8) b/c they do all the filtering and parsing of the ACPI data instead of re-implementing it in our own cpufreq-xen-scaling. Thought one other option is to export both powernow-k8 and acpi-cpufreq functions that do this and use them within the cpufreq-xen-scaling-driver but that sounds icky. I think the "unregistering" would be akin to "cpufreq_unregister_driver" but without any parameters. However, we would be the only user of this mechanism and I am not sure what Dave Jones feelings are about that (CC-ed here) Here is what I was thinking (this is of course completely untested/uncompiled): diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index 622013f..049fca9 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -1881,9 +1881,12 @@ int cpufreq_unregister_driver(struct cpufreq_driver *driver) { unsigned long flags; - if (!cpufreq_driver || (driver != cpufreq_driver)) + if (!cpufreq_driver) return -EINVAL; + if (!driver) + driver = cpufreq_driver; + pr_debug("unregistering driver %s\n", driver->name); subsys_interface_unregister(&cpufreq_interface); There would also be the need to do some tweaking of the order of drivers to load (so that cpufreq-xen-scaling-driver gets loaded _after_ all of the other ones). And I am afraid about the scenario where a cpufreq-gov-X tries to set a specific frequency and we yank it out from within and might end up with a NULL pointer. But I think the "cpufreq_driver_lock" inhibits that. The other thing that strikes me as dangerous is that we don't call the module_exit of the cpufreq scaling driver. While I was writing this up I thought about the overall picture. Right now, the APIs work in this way: [acpi processors] -> [cpufreq-scaling] <- [cpufreq-gov-*] And we want to leverage them to do two things: 1). Inhibit cpufreq-gov-* telling cpufreq-scaling drivers what to do [which is a bug right now with Linux dom0 pvops - as the cpufreq-gov-* has no clue about the other domains and might set the P states to something incorrect] 2). Upload the power management information up to the hypervisor. The 1) option can be either done by ignoring the WRMSR or outb in the hypervisor (but for that we need 2) to know _which_ ones to inhibit). Or it can be done by your suggestion by introducing a cpufreq scaling driver that would nop all calls and unregister the other drivers. Or lastly by the cpufreq-gov-xen driver which would leave it to the hypervisor to do it. The 2). Can be done by either the proposed cpufreq-gov-xen, or the earlier patch set I sent called processor-passthru [https://lwn.net/Articles/483668/] which is a simple driver that checks when cpufreq-scaling driver is loaded and then it upload the power management data. It does nothing with the cpufreq-[scaling|gov] APIs. Granted it does need a 'modprobe processor-passthru' in the xencommons to take advantage of the data - and yes, it does deal with hot-cpu on lining. Thoughts? _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.