[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] cpufreq implementation for OMAP under xen hypervisor.



Stefano,

I don't know how to control voltages and frequency on x86 HW. May be there is a reasonable way to implement Cpufreq with different numbers of pcpu and vcpu. But there is definitely no reasonable ways to do if on TI and Marvell ARM SoCs without 1:1 mapping. We need really huge amount of work on ARM SoCs in steps to break 1:1 mapping dependency. I will be reallyÂsurprised if there is an ARM SOC not affected be 1:1 dependency.

With best regards,

On Wed, Sep 10, 2014 at 9:41 PM, Stefano Stabellini <stefano.stabellini@xxxxxxxxxxxxx> wrote:
The issue with that limitation is that it doesn't scale well on large
systems. You really wouldn't want dom0 to have 18 vcpus on a Xeon E5
because it would badly affect performances. Even on a 8 cores SoC it
would be best to assign less than 8 vcpus to dom0.


Â
On Wed, 10 Sep 2014, Vitaly Chernooky wrote:
> I've intensively discussed my suggestions here and now it is transparent to me that we should not try to use Cpufreq on ARM
> SoCs without direct 1:1 pcpu:vcpu mapping in dom0. So if someone want to break 1:1 mapping he should forget Cpufreq.
> With best regards,
>
>
> On Wed, Sep 10, 2014 at 12:58 AM, Stefano Stabellini <stefano.stabellini@xxxxxxxxxxxxx> wrote:
>Â Â Â ÂOn Tue, 9 Sep 2014, Vitaly Chernooky wrote:
>Â Â Â Â> Hi All!
>Â Â Â Â>
>Â Â Â Â> On Fri, Sep 5, 2014 at 12:56 AM, Stefano Stabellini <stefano.stabellini@xxxxxxxxxxxxx> wrote:
>Â Â Â Â>Â Â Â ÂOn Thu, 4 Sep 2014, Oleksandr Dmytryshyn wrote:
>Â Â Â Â>Â Â Â Â> Hi to all.
>Â Â Â Â>Â Â Â Â>
>Â Â Â Â>Â Â Â Â> I want to implement cpufreq driver in the next way:
>Â Â Â Â>Â Â Â Â> 1. Cpufreq governor will be implemented in the Xen
>Â Â Â Â>Â Â Â Â> 2. dom0 will only change cpu frequency and voltage of the physical cpus
>Â Â Â Â>Â Â Â Â>
>Â Â Â Â>Â Â Â Â> But there are some nuances:
>Â Â Â Â>Â Â Â Â> 1. dom0 driver should read an information about operation points
>Â Â Â Â>Â Â Â Â> (frequencies and voltages) and cpu supply source from the device tree for each
>Â Â Â Â>Â Â Â Â> physical cpu. In the omap processor case this driver suspects that
>Â Â Â Â>Â Â Â Â> those settings
>Â Â Â Â>Â Â Â Â> located in the /cpus/cpu@0/ node. But hypervisor creates an cpu node
>Â Â Â Â>Â Â Â Â> for each vcpu
>Â Â Â Â>Â Â Â Â> for kernel dom0 in the device tree and those information is lost in the dom0.
>Â Â Â Â>Â Â Â Â> 2. What about this case if we will have some physical cpus with different
>Â Â Â Â>Â Â Â Â> operation points (for example 2 cpus) and we give only one cpu for dom0?
>Â Â Â Â>Â Â Â Â>
>Â Â Â Â>Â Â Â Â> How should I transfer all information from the original cpu@xxxxxx@n nodes
>Â Â Â Â>Â Â Â Â> about all physical cpus to the kernel dom0 driver? Maybe an additional
>Â Â Â Â>Â Â Â Â> nodes should be created by the hypervisor in the device tree for dom0
>Â Â Â Â>Â Â Â Â> and named as pcpu@xxxxxxx@n?
>Â Â Â Â>
>Â Â Â Â>Â Â Â ÂIf we do that, wouldn't we require changes to the core OMAP drivers or
>Â Â Â Â>Â Â Â Âcpu initialization code in Linux (to parse "pcpu" instead of "cpu"
>   Â>   Ânodes)? I don't expect they would be easy to upstream or maintain going
>Â Â Â Â>Â Â Â Âforward.
>Â Â Â Â>
>Â Â Â Â>Â Â Â ÂI am trying to think of an alternative, such as passing the real cpu
>Â Â Â Â>Â Â Â Ânodes to dom0 but then adding status = "disabled", but I am not sure
>Â Â Â Â>Â Â Â Âwhether Linux checks the status for cpu nodes. In addition this scheme
>Â Â Â Â>Â Â Â Âwouldn't support the case where dom0 has more vcpus than pcpus on the
>Â Â Â Â>Â Â Â Âsystem. Granted it is not very common and might even be detrimental for
>Â Â Â Â>Â Â Â Âperformances, but we should be able to support it.
>Â Â Â Â>
>Â Â Â Â>
>Â Â Â Â> In case where dom0 has more vcpus than pcpus on the
>Â Â Â Â> system, the dom0 kernel is the most bug-prone place for pcpu cpufreq governor. So I still believe that
>Â Â Â Âseparate driver
>Â Â Â Â> domain with direct 1:1 vcpu:pcpu mapping is the best place for cpufreq governor. But it also reasonable to run
>Â Â Â Âcpufreq
>Â Â Â Â> governor as userspace daemon in dom0.
>Â Â Â Â>
>Â Â Â Â> Also what do you think about PM QoS support? On bare metal cpufreq is tightly integrated with PM QoS and
>Â Â Â Âintensively
>Â Â Â Â> cooperate in frequency scaling.
>
> Device PM needs to be done in Dom0.
> CPU an Platform level PM architecturally belongs to Xen, but I do
> understand that to do that in Xen we would need to add lots of code to
> the hypervisor. There is no silver bullet here.
>
> A driver domain with 1:1 vcpu:pcpu mapping could work, but what kernel
> are you going to use for that? Linux? Wouldn't Linux be too big for a
> cpufreq driver domain, especially in embedded deployments? I think it
> would need at least 32MB to run.
>
>
> > With best regards,
> > Â
> >Â Â Â ÂIan, what do you think about this?
> >
> >
> >
> >Â Â Â Â> Oleksandr Dmytryshyn | Product Engineering and Development
> >Â Â Â Â> GlobalLogic
> >Â Â Â Â> M +38.067.382.2525
> >Â Â Â Â> www.globallogic.com
> >Â Â Â Â>
> >Â Â Â Â> http://www.globallogic.com/email_disclaimer.txt
> >Â Â Â Â>
> >Â Â Â Â>
> >Â Â Â Â> On Tue, Sep 2, 2014 at 9:46 PM, Andrii Tseglytskyi
> >Â Â Â Â> <andrii.tseglytskyi@xxxxxxxxxxxxxxx> wrote:
> >Â Â Â Â> >
> >Â Â Â Â> > Hi Stefano,
> >Â Â Â Â> >
> >Â Â Â Â> > Thank you for explanation.
> >Â Â Â Â> > I think this requires more and deeper investigation, but for sure dom0
> >Â Â Â Â> > must be able to do this.
> >Â Â Â Â> > Let us investigate this.
> >Â Â Â Â> >
> >Â Â Â Â> > Thank you,
> >Â Â Â Â> >
> >Â Â Â Â> > Regards,
> >Â Â Â Â> > Andrii
> >Â Â Â Â> >
> >Â Â Â Â> > On Tue, Sep 2, 2014 at 9:39 PM, Stefano Stabellini
> >Â Â Â Â> > <stefano.stabellini@xxxxxxxxxxxxx> wrote:
> >Â Â Â Â> > > On Tue, 2 Sep 2014, Andrii Tseglytskyi wrote:
> >Â Â Â Â> > >> On Tue, Sep 2, 2014 at 4:00 AM, Stefano Stabellini
> >Â Â Â Â> > >> <stefano.stabellini@xxxxxxxxxxxxx> wrote:
> >Â Â Â Â> > >> > On Fri, 29 Aug 2014, Andrii Tseglytskyi wrote:
> >Â Â Â Â> > >> >> Hi,
> >Â Â Â Â> > >> >>
> >Â Â Â Â> > >> >> Stefano, Ian,
> >Â Â Â Â> > >> >>
> >Â Â Â Â> > >> >> Could you please clarify the following point:
> >Â Â Â Â> > >> >>
> >Â Â Â Â> > >> >> I agree that decision about frequency change should be taken by Xen
> >Â Â Â Â> > >> >> hypervisor. But what about hardware frequency changing?
> >Â Â Â Â> > >> >> In general when frequency changed to bigger value (for example from 1
> >Â Â Â Â> > >> >> GHz to 1.5 GHz) for ARM kernels sequence looks like the following:
> >Â Â Â Â> > >> >>
> >Â Â Â Â> > >> >> 1) cpufreq governor decides that frequency should be changed. This
> >Â Â Â Â> > >> >> decision is taken after analysing of CPU performance data taking in
> >Â Â Â Â> > >> >> account governor policy.
> >Â Â Â Â> > >> >> 2) cpufreq governor asks cpufreq driver about new frequency.
> >Â Â Â Â> > >> >> 3) cpufreq driver compares current and target frequencies and asks
> >Â Â Â Â> > >> >> cpufreq regulator about voltage change.
> >Â Â Â Â> > >> >> 4) cpufreq regulator send i2c command to standalone microchip, which
> >Â Â Â Â> > >> >> is responsible for voltage changing.
> >Â Â Â Â> > >> >> 5) cpufreq driver asks clock framework about new frequency for CPU clock
> >Â Â Â Â> > >> >> 6) clock framework performs frequency sanity checks, taking in account
> >Â Â Â Â> > >> >> clock parents and clock divider settings, and call platform specific
> >Â Â Â Â> > >> >> "set_frequency" callback.
> >Â Â Â Â> > >> >> 7) platform specific callback performs proper HW registers
> >Â Â Â Â> > >> >> configuration for newly selected frequency
> >Â Â Â Â> > >> >>
> >Â Â Â Â> > >> >> Also there are some special cases - for example for OMAP5+ when
> >Â Â Â Â> > >> >> frequency is changed to 1.5 GHz+, two additional HW IPs should be
> >Â Â Â Â> > >> >> triggered (ABB and DCC, if someone is familiar with OMAP5+ )
> >Â Â Â Â> > >> >>
> >Â Â Â Â> > >> >> So, for generic ARM kernel we have 3 entities to change frequency:
> >Â Â Â Â> > >> >>
> >Â Â Â Â> > >> >> - cpufreq governor
> >Â Â Â Â> > >> >> - cpufreq driver
> >Â Â Â Â> > >> >> - cpufreq regulator
> >Â Â Â Â> > >> >>
> >Â Â Â Â> > >> >> + 2 additional IP for OMAP5+
> >Â Â Â Â> > >> >> - ABB
> >Â Â Â Â> > >> >> - DCC
> >Â Â Â Â> > >> >>
> >Â Â Â Â> > >> >> Taking in account all above, it looks like it would be better to
> >Â Â Â Â> > >> >> implement only Xen cpufreq governor. Xen will take a decision about
> >Â Â Â Â> > >> >> new frequency, and kernel dom0 will perform other steps. Dom0 contains
> >Â Â Â Â> > >> >> all generic and platform specific frameworks, needed for frequency
> >Â Â Â Â> > >> >> changing.
> >Â Â Â Â> > >> >>
> >Â Â Â Â> > >> >> What do you think ?
> >Â Â Â Â> > >> >
> >Â Â Â Â> > >> > Keep in mind that the architecture must be able to handle the case where
> >Â Â Â Â> > >> > dom0 has only 1 or 2 vcpus on a 4 or 8 cores system with multiple
> >Â Â Â Â> > >> > physical cpus.
> >Â Â Â Â> > >> > Could dom0 change the frequency of a physical core or a physical cpu is
> >Â Â Â Â> > >> > not even running on? If that is not a problem, because cpus and
> >Â Â Â Â> > >> > frequency changing are decoupled enough in Linux to allow it, then I am
> >Â Â Â Â> > >> > OK with it. But I suspect they are not.
> >Â Â Â Â> > >> >
> >Â Â Â Â> > >>
> >Â Â Â Â> > >> Not sure that I got your point correctly - dom0 will change frequency
> >Â Â Â Â> > >> on physical CPU.
> >Â Â Â Â> > >> And in case of OMAP - this changing affects on both ARM physical cpus
> >Â Â Â Â> > >> - changing is coupled.
> >Â Â Â Â> > >> In case of other ARM platforms - changing may be not coupled (I've
> >Â Â Â Â> > >> heard that Snapdragon can change cpu freqs independently on each
> >Â Â Â Â> > >> physical cpu)
> >Â Â Â Â> > >
> >Â Â Â Â> > > Let me explain with a concrete example.
> >Â Â Â Â> > >
> >Â Â Â Â> > > Let's suppose that the platform has 2 physical cpus, each cpu has 4
> >   Â> > > cores. Let's also supposed that dom0 has only 2 vcpus, currently
> >Â Â Â Â> > > running on core0 and core1 of cpu0.
> >Â Â Â Â> > >
> >Â Â Â Â> > > In this case would dom0 be able to change the frequency of core3 of
> >Â Â Â Â> > > cpu1, given that is not even running on it?
> >Â Â Â Â> > > If it can be done without any hacks, then we can go ahead with this
> >Â Â Â Â> > > approach.
> >Â Â Â Â> >
> >Â Â Â Â> >
> >Â Â Â Â> >
> >Â Â Â Â> > --
> >Â Â Â Â> >
> >Â Â Â Â> > Andrii Tseglytskyi | Embedded Dev
> >Â Â Â Â> > GlobalLogic
> >Â Â Â Â> > www.globallogic.com
> >Â Â Â Â>
> >
> >Â Â Â Â_______________________________________________
> >Â Â Â ÂXen-devel mailing list
> >Â Â Â ÂXen-devel@xxxxxxxxxxxxx
> >Â Â Â Âhttp://lists.xen.org/xen-devel
> >
> >
> >
> >
> > --
> > Vitaly Chernooky |ÂSenior Developer - Product Engineering and Development
> > GlobalLogic
> > P +380.44.4929695 ext.1136 M +380.98.7920568 S cvv_2k
> > www.globallogic.com
> >
> > http://www.globallogic.com/email_disclaimer.txt
> >
> >
>
>
>
>
> --
> Vitaly Chernooky |ÂSenior Developer - Product Engineering and Development
> GlobalLogic
> P +380.44.4929695 ext.1136 M +380.98.7920568 S cvv_2k
> www.globallogic.com
>
> http://www.globallogic.com/email_disclaimer.txt
>
>



--
Vitaly Chernooky |ÂSenior Developer - Product Engineering and Development
GlobalLogic
P +380.44.4929695 ext.1136 M +380.98.7920568 S cvv_2k

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.