[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [RFC PATCH 00/31] CPUFreq on ARM



On Thu, Nov 16, 2017 at 7:04 PM, Andre Przywara
<andre.przywara@xxxxxxxxxx> wrote:
> Hi,
Hi Andre

Thank you for your comments!

>
> On 16/11/17 14:57, Oleksandr Tyshchenko wrote:
>> On Wed, Nov 15, 2017 at 4:28 PM, Andre Przywara
>> <andre.przywara@xxxxxxxxxx> wrote:
>>> Hi,
>> Hi Andre, Jassi
>>
>> Thank you for your comments!
>>
>>>
>>> On 14/11/17 20:46, Oleksandr Tyshchenko wrote:
>>>> On Tue, Nov 14, 2017 at 12:49 PM, Andre Przywara
>>>> <andre.przywara@xxxxxxxxxx> wrote:
>>>>> Hi,
>>>> Hi Andre
>>>>
>>>>>
>>>>> On 13/11/17 19:40, Oleksandr Tyshchenko wrote:
>>>>>> On Mon, Nov 13, 2017 at 5:21 PM, Andre Przywara
>>>>>> <andre.przywara@xxxxxxxxxx> wrote:
>>>>>>> Hi,
>>>>>> Hi Andre,
>>>>>>
>>>>>>>
>>>>>>> thanks very much for your work on this!
>>>>>> Thank you for your comments.
>>>>>>
>>>>>>>
>>>>>>> On 09/11/17 17:09, Oleksandr Tyshchenko wrote:
>>>>>>>> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@xxxxxxxx>
>>>>>>>>
>>>>>>>> Hi, all.
>>>>>>>>
>>>>>>>> The purpose of this RFC patch series is to add CPUFreq support to Xen 
>>>>>>>> on ARM.
>>>>>>>> Motivation of hypervisor based CPUFreq is to enable one of the main PM 
>>>>>>>> use-cases in virtualized system powered by Xen hypervisor. Rationale 
>>>>>>>> behind this activity is that CPU virtualization is done by hypervisor 
>>>>>>>> and the guest OS doesn't actually know anything about physical CPUs 
>>>>>>>> because it is running on virtual CPUs. It is quite clear that a 
>>>>>>>> decision about frequency change should be taken by hypervisor as only 
>>>>>>>> it has information about actual CPU load.
>>>>>>>
>>>>>>> Can you please sketch your usage scenario or workloads here? I can think
>>>>>>> of quite different scenarios (oversubscribed server vs. partitioning
>>>>>>> RTOS guests, for instance). The usefulness of CPUFreq and the trade-offs
>>>>>>> in the design are quite different between those.
>>>>>> We keep embedded use-cases in mind. For example, it is a system with
>>>>>> several domains,
>>>>>> where one domain has most critical SW running on and other domain(s)
>>>>>> are, let say, for entertainment purposes.
>>>>>> I think, the CPUFreq is useful where power consumption is a question.
>>>>>
>>>>> Does the SoC you use allow different frequencies for each core? Or is it
>>>>> one frequency for all cores? Most x86 CPU allow different frequencies
>>>>> for each core, AFAIK. Just having the same OPP for the whole SoC might
>>>>> limit the usefulness of this approach in general.
>>>> Good question. All cores in a cluster share the same clock. It is
>>>> impossible to set different frequencies on the cores inside one
>>>> cluster.
>>>>
>>>>>
>>>>>>> In general I doubt that a hypervisor scheduling vCPUs is in a good
>>>>>>> position to make a decision on the proper frequency physical CPUs should
>>>>>>> run with. From all I know it's already hard for an OS kernel to make
>>>>>>> that call. So I would actually expect that guests provide some input,
>>>>>>> for instance by signalling OPP change request up to the hypervisor. This
>>>>>>> could then decide to act on it - or not.
>>>>>> Each running guest sees only part of the picture, but hypervisor has
>>>>>> the whole picture, it knows all about CPU, measures CPU load and able
>>>>>> to choose required CPU frequency to run on.
>>>>>
>>>>> But based on what data? All Xen sees is a vCPU trapping on MMIO, a
>>>>> hypercall or on WFI, for that matter. It does not know much more about
>>>>> the guest, especially it's rather clueless about what the guest OS
>>>>> actually intended to do.
>>>>> For instance Linux can track the actual utilization of a core by keeping
>>>>> statistics of runnable processes and monitoring their time slice usage.
>>>>> It can see that a certain process exhibits periodical, but bursty CPU
>>>>> usage, which may hint that is could run at lower frequency. Xen does not
>>>>> see this fine granular information.
>>>>>
>>>>>> I am wondering, does Xen
>>>>>> need additional input from guests for make a decision?
>>>>>
>>>>> I very much believe so. The guest OS is in a much better position to
>>>>> make that call.
>>>>>
>>>>>> BTW, currently guest domain on ARM doesn't even know how many physical
>>>>>> CPUs the system has and what are these OPPs. When creating guest
>>>>>> domain Xen inserts only dummy CPU nodes. All CPU info, such as clocks,
>>>>>> OPPs, thermal, etc are not passed to guest.
>>>>>
>>>>> Sure, because this is what virtualization is about. And I am not asking
>>>>> for unconditionally allowing any guest to change frequency.
>>>>> But there could be certain use cases where this could be considered:
>>>>> Think about your "critical SW" mentioned above, which is probably some
>>>>> RTOS, also possibly running on pinned vCPUs. For that
>>>>> (latency-sensitive) guest it might be well suited to run at a lower
>>>>> frequency for some time, but how should Xen know about this?
>>>>> "Normally" the best strategy to save power is to run as fast as
>>>>> possible, finish all outstanding work, then put the core to sleep.
>>>>> Because not running at all consumes much less energy than running at a
>>>>> reduced frequency. But this may not be suitable for an RTOS.
>>>> Saying "one domain has most critical SW running on" I meant hardware
>>>> domain/driver domain or even other
>>>> domain which perform some important tasks (disk, net, display, camera,
>>>> whatever) which treated by the whole system as critical
>>>> and must never fail. Other domains, for example, it might be Android
>>>> as well, are not critical at all from the system point of view.
>>>> Being honest, I haven't considered yet using CPUFreq in system where
>>>> some RT guest is present.
>>>> I think it is something that should be *thoroughly* investigated and
>>>> then worked out.
>>>
>>> Yes, as mentioned before there are quite different use cases with quite
>>> different requirements when it comes to DVFS.
>>> I believe the best would be to define typical scenarios, then assess the
>>> usefulness of CPUFreq separately for each one of them.
>>> Based on this we then should be able to make a decision.
>>
>> Agree here.
>> Well, let's imagine following use-case(s), maybe too complex, but it
>> might take place.
>> ARM SoC is big.LITTLE and it has >=1 big core(s) and >=1 little
>> core(s) with following abilities:
>> 1. big core(s) is DVFS capable (>1 OPP), little core(s) isn't DVFS
>> capable (1 OPP) and vice versa.
>> 2. Both types are DVFS capable.
>> The system which runs on this SoC has 3 guests:
>> 1. Thin dom0, has some storage driver (mmc, sata, whatever) with
>> blkback running.
>> Tasks:
>> - Running VM
>> - Watchdog
>> - vbd support
>> 2. Driver domain (maybe RT-guest: Linux with RT infra or even some
>> RTOS, maybe non-RT-guest)
>> For example, instrumental cluster.
>> Tasks:
>> - Gears
>> - RVC
>> - OpenCL
>> - 3D UI
>> - vdispl, vsnd, vif, vusb, (vbd) support.
>> 3. Entertainment domain.
>> For example, Android.
>> Tasks:
>> - Navi(Maps)
>> - Multimedia(Audio/Video)
>> - Cell
>> - OTA
>> - Third-party apps
>> Also, such system might be "battery-powered".
>
> All valid points, and demonstrates the variety of use cases. I was
> hoping for more general systems or guest use case, like:
> - oversubscribed server machine, possibly in a migration pool
> - server for isolating system components (web server, mail server,
> application server), possibly not loaded 100% all of the time
> - desktop machine or laptop, isolation for security reasons (Qubes OS)
> - embedded system, mostly partitioning (not oversubscribed, vCPUs pinned)
> - embedded system with at least one "media domain" (video/audio playback)
> - embedded system with at least one realtime domain
> ....
>
>>>
>>>> I am not familiar with RT system requirements, I suppose, but not
>>>> entirely sure, that CPUFreq should use const
>>>> frequency for all cores the RT system is running on, or RT system
>>>> parameters should be recalculated each time the CPU frequency is being
>>>> changed
>>>> (in such case guest needs some input from Xen).
>>>>
>>>> Anyway, I got your point about some guest input. Could you, please,
>>>> describe how you think it should look like:
>>>> 1. Xen doesn't have CPUFreq logic at all. It only collects OPP change
>>>> requests from all guests and make
>>>> a decision based on these requests and maybe some policy for
>>>> prioritizing requests. Then it sends OPP change request to SCP.
>>>> 2. Xen has CPUFreq logic. In addition it can collect OPP change
>>>> requests from all guests and make
>>>> a decision based on both: it's own view and guest requests. Then it
>>>> sends OPP change request to SCP.
>>>
>>> I am leaning towards 1) conceptually. But if there is some kind of
>>> reasonable implementation of 2) already in Xen (for x86), this might be
>>> feasible as well.
>>
>> Sure, Xen has common CPUFreq infra (core, set of governors) and
>> two ACPI P-state CPUFreq drivers. Actually this patch series adds SCPI-based
>> CPUFreq driver, which as well as existing drivers, are just for
>> issuing command to change CPU frequency.
>> The entity which decides what CPU frequency to set next is already present.
>>
>> I got your point. I think that approach 1 is radically different from
>> what we have in Xen for x86 these days.
>> Anyway, we need to weight all pros and cons to decide what direction
>> we want to follow.
>>
>> BTW, I see that existing CPUFreq drivers can read some performance counters
>> to measure performance over a period of time
>
> Is that APERF/MPERF on x86? Which gives you the ratio between idle and
> wall clock time?
Yes, I meant APERF/MPERF.

>
>> and this measured
>> performance can be used as an additional input for
>> governor then. Do we have something on ARM?
>
> Not architecturally, but I guess you can track the arch timer counter
> before entering WFI and when coming back to record the time spent sleeping.
> But I am not sure that sleep time is a good metric to deduct CPU frequency.
Agree.

>
>> I was thinking, how to actually take into the account guest's OPP
>> change requests from the governor's perspective,
>> and these "requests" might be considered as performance counters.
>
> Maybe, maybe it's even simpler. You have a static vCPU frequency
> setting, as given by the administrator from Dom0, either at domain
> creation time or at runtime. Plus you have the guests' requests, which
> may or may not override this.
> So the policies could be:
> - Always run at full speed.
> - Run at full speed, and realise guest CPUFreq requests
> - Run at low speed, and realise guest CPUFreq requests
> - Always run at low speed
I see, looks like userspace governor modified a bit.

>
> So Xen does not need to throw in its own ideas here. Which would avoid
> some of the hard problems we encountered.
I got all your point.
Just question. Why does existing CPUFreq on x86 have own logic? Do we have
something yet another on ARM that having own logic in Xen doesn't make
any sense?

>
>>>> Both variant implies that something like PV CPUFreq should be involved
>>>> with frontend drivers are located in guests. Am I correct?
>>>
>>> And here the SMC mailbox comes into play again, but with a twist. For
>>> guests we create SCPI, mailbox and shmem DT nodes, and use the SMC
>>> mailbox with: method = "hvc";. Xen's HVC handles then redirects this to
>>> the CPUFreq code.
>>> This would be platform agnostic for the guests, while making all CPUFreq
>>> requests ending up in Xen. So there is no need for an extra PV protocol.
>>
>> This idea is indeed interesting.
>>
>> Could you please answer these questions:
>> 1. As I understand correctly here in Xen we have to emulate all DVFS
>> related commands, I mean to be an SCP for the guests?
>
> Yes, though "emulate all DVFS commands" sounds more complicated than it
> is, it could be as simple as my ATF implementation:
> https://github.com/apritzel/arm-trusted-firmware/commit/2f6f7d1746f72d0fe4da461ab1b3bfddc082636d
Yes, though for "Xen being an SCP for guest" we need a little bit more
than just this patch adds, I guess.
But anyway, we have a good example to start.

>
>> 2. How do we recognize from guest's OPP change request on which
>> physical CPU it wants to change frequency?
>
> I think that maps to the DVFS power domains. We could offer one power
> domain per vCPU.
Probably, yes. I don't see why this idea won't work.

>
>>     Do we need to pin guest's vCPU to the respective pCPU?
>
> No, I don't see why. Makes the code and the decision when to switch more
> complicated, of course.
Good.

>
>> 3. Linux "SCPI CPUFreq Interface driver" is tied to "ARM big.LITTLE
>> Platforms CPUFreq driver", so will the latter be "happy"
>>     to play with virtual CPUs a particular guests is running on?
>
> I think so. But possibly SCMI provides a better answer to this.
So, need additional investigation.

>
>> 4. Together with creating dummy SCPI nodes for guest we have to insert
>> clock specifier into a CPU node
>>     which we expose to guest (clocks = <&scpi_dvfs 0>;). Correct?
>
> Yes, but that should be easy.
Agree.

>
>> 5. Will there be any possible synchronization issues if two guest send
>> OPP change requests at the same time?
>
> No, this is per a VCPU trap to EL2 and will be handled in context of a
> VCPU and its domain.
Agree there too.

> How this translates to the actual frequency of a
> physical core is a different question, though. One of the reason I am a
> bit wary of the usefulness of this exercise: because the downclocked
> physical core might be given to another VCPU in another guest shortly
> afterwards, at which point it might need to be clocked up again - or not.
Oh, I see, some scheduler input might be needed...

>
>>>>> So I think we would need a combined approach:
>>>>> a) Let an administrator (via tools running in Dom0) tell Xen about power
>>>>> management strategies to use for certain guests. An RTOS could be
>>>>> treated differently (lower, but constant frequency) than an
>>>>> "entertainment" guest (varying frequency, based on guest OS input), also
>>>>> differently than some background guest doing logging, OTA update, etc.
>>>>> (constant high frequency, but putting cores to sleep instead as often as
>>>>> possible).
>>>>> b) Allow some guests (based on policy from (a)) to signal CPUFreq change
>>>>> requests to the hypervisor. Xen takes those into account, though it may
>>>>> decide to not act immediately on it, because it is going to schedule
>>>>> another vCPU, for instance.
>>>>> c) Have some way of actually realising certain OPPs. This could be via
>>>>> an SCPI client in Xen, or some other way. Might be an implementation 
>>>>> detail.
>>>>
>>>> Just to clarify if I got the main idea correct:
>>>> 1. Guests have CPUFreq logic, they send OPP change requests to Xen.
>>>> 2. Xen has CPUFreq logic too, but in additional it can take into the 
>>>> account OPP
>>>>     change requests from guests. Xen sends final OPP change request.
>>>> Is my understanding correct?
>>>
>>> Yes, I think this sounds like the most flexible. Xen's CPUFreq logic
>>> could be quite simple, possibly starting with some static assignment
>>> based on administrator input, e.g. given at guest creation time.
>>> It might not involve further runtime decisions.
>>>
>>>> Also "Different power management strategies to use for certain guests"
>>>> means that it should be
>>>> hard vCPU->pCPU pinning for each guest together with possibility in
>>>> Xen to have different CPUFreq governors
>>>> running at the same time (each governor for each CPU pool)?
>>>
>>> That would need to be worked out, but I suspect that CPU pinning might
>>> be *one* option for a certain class of guests. This would probably be
>>> related to the CPUFreq policy. Without pinning the decision might become
>>> quite involved: If Xen wants to migrate a vCPU to a different pCPU, it
>>> needs to take the different P-states into account, including the cost to
>>> change the OPP. I am not sure the benefit justifies the effort. Some
>>> numbers would help here.
>>
>> I can't even imagine a development effort of adding ability to have different
>> CPUFreq policies over different CPUs in Xen. Another question is, if
>> all cores shares OPP
>> it is not feasible to realize that, I am afraid.
>>
>> Anyway, I think we should go step-by-step.
>> If community agreed that CPUFreq feature in Xen on ARM was needed and
>> SCPI/SCMI based approach
>> was the right thing to do in general I would stick to next taking into
>> the account Andre's suggestions
>> regarding some guest input:
>>
>> 1. Xen do have CPUFreq logic. It measures CPUs utilization by itself.
>> 2. In addition it can collect OPP change requests from the guests:
>>   - There are some politics describing which guest is allowed to send
>> OPP change request.
>>   - Of course, involved guests have CPUFreq enabled. All we need is
>> these OPP change requests don't lead to
>>     any physical changes and be picked up by Xen. Here we could use
>> Andre's idea here (SCPI CPUFreq + SMC mailbox with hvc method).
>> 3. Xen makes a decision based on the whole system status it measures
>> periodically and guests input (OPP change requests) if present.
>> 4. Xen actually issues command to change the CPU frequency (sends OPP
>> change request to SCP).
>>
>> How does it sound?
>
> 0. Decide whether CPUFreq justifies 1.-4. in the first place.
Sure,
> That sounds like a lot of work and code, so we should be sure it's worth it.
>
> I wonder if you could provide some input, ideally measurements on the
> actual power savings CPUFreq provides.
Well, I think I will be able to provide some numbers when a firmware,
which runs on the SoC
I am using, is ready. Actually, currently I have an emulator without
any real freq/volt changes.

>
> Does the wish to have CPUFreq purely come from some "tick-the-box"
> exercise? As in: We have it on native Linux, so we need it in Xen?
As I said before, we are interesting in purely embedded use-cases
where power consumption is a question.
If you know how to save power without having CPUFreq involved I would
appreciate the pointers.

>
> What power savings can we expect from CPUFreq? Can those possible
> savings be transferred into a virtualized environment at all? And do
> those saving justify all the extra code in Xen?
>
> I think those questions need to be answered first, then we can discuss
> about the implementation details.
OK.

>
>>>>>>>> Although these required components (CPUFreq core, governors, etc) 
>>>>>>>> already exist in Xen, it is worth to mention that they are ACPI 
>>>>>>>> specific. So, a part of the current patch series makes them more 
>>>>>>>> generic in order to make possible a CPUFreq usage on architectures 
>>>>>>>> without ACPI support in.
>>>>>>>
>>>>>>> Have you looked at how this is used on x86 these days? Can you briefly
>>>>>>> describe how this works and it's used there?
>>>>>> Xen supports CPUFreq feature on x86 [1]. I don't know how widely it is
>>>>>> used at the moment, but it is another question. So, there are two
>>>>>> possible modes: Domain0 based CPUFreq and Hypervisor based CPUFreq
>>>>>> [2]. As I understand, the second option is more popular.
>>>>>> Two different implementations of "Hypervisor based CPUFreq" are
>>>>>> present: ACPI Processor P-States Driver and AMD Architectural P-state
>>>>>> Driver. You can find both them in xen/arch/x86/acpi/cpufreq/ dir.
>>>>>>
>>>>>> [1] 
>>>>>> https://wiki.xenproject.org/wiki/Xen_power_management#CPU_P-states_.28cpufreq.29
>>>>>> [2] 
>>>>>> https://wiki.xenproject.org/wiki/Xen_power_management#Hypervisor_based_cpufreq
>>>>>
>>>>> Thanks for the research and the pointers, will look at it later.
>>>>>
>>>>>>>> But, the main question we have to answer is about frequency changing 
>>>>>>>> interface in virtualized system. The frequency changing interface and 
>>>>>>>> all dependent components which needed CPUFreq to be functional on ARM 
>>>>>>>> are not present in Xen these days. The list of required components is 
>>>>>>>> quite big and may change across different ARM SoC vendors. As an 
>>>>>>>> example, the following components are involved in DVFS on Renesas 
>>>>>>>> Salvator-X board which has R-Car Gen3 SoC installed: generic clock, 
>>>>>>>> regulator and thermal frameworks, Vendor’s CPG, PMIC, AVS, THS 
>>>>>>>> drivers, i2c support, etc.
>>>>>>>>
>>>>>>>> We were considering a few possible approaches of hypervisor based 
>>>>>>>> CPUFreqs on ARM and came to conclusion to base this solution on 
>>>>>>>> popular at the moment, already upstreamed to Linux, ARM System Control 
>>>>>>>> and Power Interface(SCPI) protocol [1]. We chose SCPI protocol instead 
>>>>>>>> of newer ARM System Control and Management Interface (SCMI) protocol 
>>>>>>>> [2] since it is widely spread in Linux, there are good examples how to 
>>>>>>>> use it, the range of capabilities it has is enough for implementing 
>>>>>>>> hypervisor based CPUFreq and, what is more, upstream Linux support for 
>>>>>>>> SCMI is missed so far, but SCMI could be used as well.
>>>>>>>>
>>>>>>>> Briefly speaking, the SCPI protocol is used between the System Control 
>>>>>>>> Processor(SCP) and the Application Processors(AP). The mailbox feature 
>>>>>>>> provides a mechanism for inter-processor communication between SCP and 
>>>>>>>> AP. The main purpose of SCP is to offload different PM related tasks 
>>>>>>>> from AP and one of the services that SCP provides is Dynamic voltage 
>>>>>>>> and frequency scaling (DVFS), it is what we actually need for CPUFreq. 
>>>>>>>> I will describe this approach in details down the text.
>>>>>>>>
>>>>>>>> Let me explain a bit more what these possible approaches are:
>>>>>>>>
>>>>>>>> 1. “Xen+hwdom” solution.
>>>>>>>> GlobalLogic team proposed split model [3], where “hwdom-cpufreq” 
>>>>>>>> frontend driver in Xen interacts with the “xen-cpufreq” backend driver 
>>>>>>>> in Linux hwdom (possibly dom0) in order to scale physical CPUs. This 
>>>>>>>> solution hasn’t been accepted by Xen community yet and seems it is not 
>>>>>>>> going to be accepted without taking into the account still unanswered 
>>>>>>>> major questions and proving that “all-in-Xen” solution, which Xen 
>>>>>>>> community considered as more architecturally cleaner option, would be 
>>>>>>>> unworkable in practice.
>>>>>>>> The other reasons why we decided not to stick to this approach are 
>>>>>>>> complex communication interface between Xen and hwdom: event channel, 
>>>>>>>> hypercalls, syscalls, passing CPU info via DT, etc and possible 
>>>>>>>> synchronization issues with a proposed solution.
>>>>>>>> Although it is worth to mention that the beauty of this approach was 
>>>>>>>> that there wouldn’t be a need to port a lot of things to Xen. All 
>>>>>>>> frequency changing interface and all dependent components which needed 
>>>>>>>> CPUFreq to be functional were already in place.
>>>>>>>
>>>>>>> Stefano, Julien and I were thinking about this: Wouldn't it be possible
>>>>>>> to come up with some hardware domain, solely dealing with CPUFreq
>>>>>>> changes? This could run a Linux kernel, but no or very little userland.
>>>>>>> All its vCPUs would be pinned to pCPUs and would normally not be
>>>>>>> scheduled by Xen. If Xen wants to change the frequency, it schedules the
>>>>>>> respective vCPU to the right pCPU and passes down the frequency change
>>>>>>> request. Sounds a bit involved, though, and probably doesn't solve the
>>>>>>> problem where this domain needs to share access to hardware with Dom0
>>>>>>> (clocks come to mind).
>>>>>> Yes, another question is how to get this Linux kernel stuff (backend,
>>>>>> top level driver, etc) upstreamed.
>>>>>
>>>>> Well, the idea would be to use already upstream drivers to actually
>>>>> implement OPP changes (via Linux clock and regulator drivers), then use
>>>>> existing interfaces like the userspace governor, for instance, to
>>>>> trigger those. I don't think we need much extra kernel code for that.
>>>> I understand. Backend in userspace sets desired frequency by request
>>>> from frontend in Xen.
>>>
>>> Yeah, something like that. It was just an idea, not fully thought
>>> through yet.
>>>
>>>>>>>> Although this approach is not used, still I picked a few already acked 
>>>>>>>> patches which made ACPI specific CPUFreq stuff more generic.
>>>>>>>>
>>>>>>>> 2. “all-in-Xen” solution.
>>>>>>>> This implies that all CPUFreq related stuff should be located in Xen.
>>>>>>>> Community considered this solution as more architecturally cleaner 
>>>>>>>> option than “Xen+hwdom” one. No layering violation comparing with the 
>>>>>>>> previous approach (letting guest OS manage one or more physical CPUs 
>>>>>>>> is more of a layering violation).
>>>>>>>> This solution looks better, but to be honest, we are not in favor of 
>>>>>>>> this solution as well. We expect enormous developing effort to get 
>>>>>>>> this support in (the scope of required components looks unreal) and 
>>>>>>>> maintain it. So, we decided not to stick to this approach as well.
>>>>>>>
>>>>>>> Yes, I even think it's not feasible to implement this. With a modern
>>>>>>> clock implementation there is one driver to control *all* clocks of an
>>>>>>> SoC, so you can't single out the CPU clock easily, for instance. One
>>>>>>> would probably run into synchronisation issues, at best.
>>>>>>>
>>>>>>>> 3. “Xen+SCP(ARM TF)” solution.
>>>>>>>> It is yet another solution based on ARM SCPI protocol. The generic 
>>>>>>>> idea here is that there is a firmware, which being a server runs on 
>>>>>>>> some dedicated IP core (server), provides different PM services (DVFS, 
>>>>>>>> sensors, etc). On the other side there is a CPUFreq driver in Xen, 
>>>>>>>> which is running on the AP (client), consumes these services. CPUFreq 
>>>>>>>> driver neither changes the CPU frequency/voltage by itself nor 
>>>>>>>> cooperates with Linux in order to do such job. It just communicates 
>>>>>>>> with SCP directly using SCPI protocol. As I said before, some 
>>>>>>>> integrated into a SoC mailbox IP need to be used for IPC (doorbell for 
>>>>>>>> triggering action and shared memory region for commands). CPUFreq 
>>>>>>>> driver doesn’t even need to know what should be physically changed for 
>>>>>>>> the new frequency to take effect. It is a certainly SCP’s 
>>>>>>>> responsibility. This all avoid CPUFreq infrastructure in Xen on ARM 
>>>>>>>> from diving into each supported SoC internals and as the result having 
>>>>>>>> a lot of code.
>>>>>>>>
>>>>>>>> The possible issue here could be in SCP, the problem is that some 
>>>>>>>> dedicated IP core may be absent at all or performs other than PM 
>>>>>>>> tasks. Fortunately, there is a brilliant solution to teach firmware 
>>>>>>>> running in the EL3 exception level (ARM TF) to perform SCP functions 
>>>>>>>> and use SMC calls for communications [4]. Exactly this transport 
>>>>>>>> implementation I want to bring to Xen the first. Such solution is 
>>>>>>>> going to be generic across all ARM platforms that do have firmware 
>>>>>>>> running in the EL3 exception level and don’t have candidate for being 
>>>>>>>> SCP.
>>>>>>>
>>>>>>> While I feel flattered that you like that idea as well ;-), you should
>>>>>>> mention that this requires actual firmware providing those services.
>>>>>> Yes, a some firmware, which provides these services, must be present
>>>>>> on the other end.
>>>>>> It is a firmware which runs on the dedicated IP core(s) in common case.
>>>>>> And it is a firmware which runs on the same core(s) as the hypervisor
>>>>>> in particular case.
>>>>>>
>>>>>>> I
>>>>>>> am not sure there is actually *any* implementation of this at the
>>>>>>> moment, apart from my PoC code for Allwinner.
>>>>>> Your PoC is a good example for writing firmware side. So, why don't
>>>>>> use it as a base for
>>>>>> other platform.
>>>>>
>>>>> Sure, but normally firmware is provided by the vendor. And until more
>>>>> vendors actually implement this, it's a bit weird to ask Xen users to
>>>>> install this hand-crafted home-brew firmware to use this feature.
>>>>> For a particular embedded use case like yours this might be feasible,
>>>>> though.
>>>> Agree. it is exactly for ARM SoCs with security extensions enabled,
>>>> but where SCP isn't available.
>>>> And these SoCs are exists.
>>>
>>> Sure, also it depends on the accessibility of firmware. Some SoCs only
>>> run signed firmware, or there is no source code for crucial firmware
>>> components (SoC setup, DRAM init), so changing the firmware might not be
>>> an option.
>>
>> Agree.
>>
>>>
>>>>>>> And from a Xen point of view I am not sure we are in the position to
>>>>>>> force users to use this firmware. This may be feasible in a classic
>>>>>>> embedded scenario, where both firmware and software are provided by the
>>>>>>> same entity, but that should be clearly noted as a restriction.
>>>>>> Agree.
>>>>>>
>>>>>>>
>>>>>>>> Here we have completely synchronous case because of SMC calls nature. 
>>>>>>>> SMC triggered mailbox driver emulates a mailbox which signals 
>>>>>>>> transmitted data via Secure Monitor Call (SMC) instruction [5]. The 
>>>>>>>> mailbox receiver is implemented in firmware and synchronously returns 
>>>>>>>> data when it returns execution to the non-secure world again. This 
>>>>>>>> would allow us both to trigger a request and transfer execution to the 
>>>>>>>> firmware code in a safe and architected way. Like PSCI requests.
>>>>>>>> As you can see this method is free from synchronization issues. What 
>>>>>>>> is more, this solution is more architecturally cleaner solution than 
>>>>>>>> split model “Xen+hwdom” one. From the security point of view, I hope, 
>>>>>>>> everything will be much more correct since the ARM TF, which we want 
>>>>>>>> to see in charge of controlling CPU frequency/voltage, is a trusted SW 
>>>>>>>> layer. Moreover, ARM TF is responsible for enabling/disabling CPU 
>>>>>>>> (PSCI) and nobody complains about it, so let it do DVFS too.
>>>>>>>
>>>>>>> It should be noted that this synchronous nature of the communication can
>>>>>>> actually be a problem: a DVFS request usually involves regulator and PLL
>>>>>>> changes, which could take some time to settle in. Blocking all of this
>>>>>>> time (milliseconds?) in EL3 (probably busy-waiting) might not be 
>>>>>>> desirable.
>>>>>> Agree. I haven't measured time yet to say how long is it, since I
>>>>>> don't have a working firmware at the moment, just an emulator,
>>>>>> but, yes, it will definitely take some time. The whole system won't be
>>>>>> blocked, only the CPU which performs SMC call.
>>>>>> But, if we ask hwdom to change frequency we will wait too? Or if Xen
>>>>>> manages PLL/regulator by itself, it will wait anyway?
>>>>>
>>>>> Normally this is done asynchronously. For instance the OS programs the
>>>>> regulator to change the voltage, then does other things until the
>>>>> regulator signals the change has been realised. The it re-programs the
>>>>> PLL, again executing other code, eventually being interrupted by a
>>>>> completion interrupt (or by periodically polling a bit). If we need to
>>>>> spend all of this time in EL3, the HV is blocked on this. This might or
>>>>> might not be a problem, but it should be noted.
>>>> Agree.
>>>>
>>>>>
>>>>>>>> I have to admit that I have checked this solution only due to a lack 
>>>>>>>> of candidate for being SCP. But, I hope, that other ARM SoCs where 
>>>>>>>> dedicated SCP is present (asynchronous case) will work too, but with 
>>>>>>>> some limitations. The mailbox IPs for these ARM SoCs must have 
>>>>>>>> TX/RX-done irqs. I have described in the corresponding patches why 
>>>>>>>> this limitation is present.
>>>>>>>>
>>>>>>>> To be honest I have Renesas R-Car Gen3 SoCs in mind as our nearest 
>>>>>>>> target, but I would like to make this solution as generic as possible. 
>>>>>>>> I don’t treat proposed solution as world-wide generic, but I hope, 
>>>>>>>> this solution may be suitable for other ARM SoCs which meet such 
>>>>>>>> requirements. Anyway, having something which works, but doesn’t cover 
>>>>>>>> all cases is better than having nothing.
>>>>>>>>
>>>>>>>> I would like to notice that the patches are POC state and I post them 
>>>>>>>> just to illustrate in more detail of what I am talking about. Patch 
>>>>>>>> series consist of the following parts:
>>>>>>>> 1. GL’s patches which make ACPI specific CPUFreq stuff more generic. 
>>>>>>>> Although these patches has been already acked by Xen community and the 
>>>>>>>> CPUFreq code base hasn’t changed in a last few years I drop all A-b.
>>>>>>>> 2. A bunch of device-tree helpers and macros.
>>>>>>>> 3. Direct ported SCPI protocol, mailbox infrastructure and the ARM SMC 
>>>>>>>> triggered mailbox driver. All components except mailbox driver are in 
>>>>>>>> mainline Linux.
>>>>>>>
>>>>>>> Why do you actually need this mailbox framework? Actually I just
>>>>>>> proposed the SMC driver the make it fit into the Linux framework. All we
>>>>>>> actually need for SCPI is to write a simple command into some memory and
>>>>>>> "press a button". I don't see a need to import the whole Linux
>>>>>>> framework, especially as our mailbox usage is actually just a corner
>>>>>>> case of the mailbox's capability (namely a "single-bit" doorbell).
>>>>>>> The SMC use case is trivial to implement, and I believe using the Juno
>>>>>>> mailbox is similarly simple, for instance.
>>>>>> I did a direct port for SCPI protocol. I think, it is something that
>>>>>> should be retained as much as possible.
>>>>>
>>>>> But the actual protocol is really simple. And we just need a subset of
>>>>> it, namely to query and trigger OPPs.
>>>> Yes. I think, that "Sensors service" is needed as well. I think that
>>>> CPUFreq is not completed without thermal feedback.
>>>
>>> Personally I think this should be handled by the SCPI firmware: if the
>>> requested OPP would violate thermal constraint, the firmware would just
>>> not set it. Also (secure) temperature alarm interrupts could lower the OPP.
>>> Doing this in firmware means it would just need to be implemented once,
>>> and I consider this system critical, so firmware is conceptually the
>>> better place for this code.
>>
>> Sounds reasonable for me.
>>
>>>
>>>>>> Protocol relies on mailbox feature, so I ported mailbox too. I think,
>>>>>> it would be much more easy for me to just add
>>>>>> a few required commands handling with issuing SMC call and without any
>>>>>> mailbox infrastructure involved.
>>>>>> But, I want to show what is going on and what place these things come 
>>>>>> from.
>>>>>
>>>>> I appreciate that, but I think we already have enough "bloated" Linux +
>>>>> glue code in Xen. And in particular the Linux mailbox framework is much
>>>>> more powerful than we need for SCPI, so we have a lot of unneeded
>>>>> functionality.
>>>>> If we just want to support CPUfreq using SCPI via SMC/Juno MHU/Rockchip
>>>>> mailbox, we can get away with a *much* simpler solution.
>>>>
>>>> Agree, but I am afraid that simplifying things now might lead to some
>>>> difficulties when there is a need
>>>> to integrate a little bit different mailbox IP. Also, we need to
>>>> recheck if SCMI, we might want to support as well,
>>>> have the similar interface with mailbox.
>>>>
>>>>> - We would need to port mailbox drivers one-by-one anyway, so we could
>>>>> as well implement the simple "press-the-button" subset for each mailbox
>>>>> separately. The interface between the SCPI code and the mailbox is
>>>>> probably just "signal_mailbox()". For SMC it's trivial, and for the Juno
>>>>> MHU it's also simple, I guess ([1], chapter 3.6).
>>>>> - The SCPI message assembly is easy as well.
>>>>> - The only other code needed is some DT parsing code to be compatible
>>>>> with the existing DTs describing the SCPI implementation. We would claim
>>>>> to have a mailbox driver for those compatibles, but cheat a bit since we
>>>>> only use it for SCPI and just need the single bit subset of the mailbox.
>>>> Yes, I think, we can optimize in a such way.
>>>>
>>>> Just to clarify:
>>>> Proposed "signal_mailbox" is intended for both actions: sending
>>>> request and receiving response?
>>>> So when it returns we will have either response or timeout error or
>>>> some callback will be needed anyway?
>>>>
>>>> I don't have any objections regarding optimizations, we need to
>>>> decide what mailboxes we should stick to (we can support) and in what
>>>> form we should keep
>>>> all this stuff in.
>>>> Also while making a decision, we need to keep in mind "direct ported
>>>> code" advantages:
>>>> - "direct ported code" (SCPI + mailbox) have had a thorough review by
>>>> the Linux community and Xen community
>>>>   may rely on their review.
>>>> - As "direct ported code" wasn't changed heavily, I believe, it would
>>>> be easy to backport fixes/features to Xen.
>>>
>>> I understand that, but as I wrote in the other mail: This is a lean
>>> hypervisor, not a driver and subsystem dump site. The security aspect of
>>>  just having much less code is crucial here.
>>>
>>>> So, let's decide.
>>>>
>>>>>
>>>>>> What is more, I don't want to restrict a usage of this CPUFreq by only
>>>>>> covering single scenario where a
>>>>>> firmware, which provides DVFS service, is in ARM TF. I hope, that this
>>>>>> solution will be suitable for ARM SoCs where a standalone SCP
>>>>>> is present and real mailbox IP, which has asynchronous nature, is used
>>>>>> for IPC. Of course, this mailbox must have TX/RX-done irqs.
>>>>>> This is a limitation at the moment.
>>>>>
>>>>> Sure, see above and the document [1] below.
>>>> Thank you for the link, it seems with MHU we have to poll for the
>>>> last_tx_done (where deasserted interrupt line in a status register is
>>>> a condition for)
>>>> after pressing the button. Or I missed something?
>>>
>>> It depends on whether we care. We could just treat this request in a
>>> fire-and-forget manner. I am not sure in how far Xen really needs to
>>> know the actual OPP used and when it's ready.
>>
>> I got your point.
>>
>> There is a "get" callback for CPUFreq drivers, where the CPUFreq core
>> expects to get current frequency.
>> Current frequency is also needed for initial condition, we might guess
>> it, but why if SCPI does allow to retrieve it.
>
> Well, that means you can read it if you want to know. That doesn't mean
> that an implementation needs to poll the current state to see if has
> been realized already.
> If a system asks for a lower frequency, it might just express the
> possibility to run at this speed, not necessary the hard requirement to
> actually do so.
>
>> Personally I think, that although "fire-and-forget" manner has
>> advantage (a code is much simple) we will never know what is going on
>> in case of errors,
>> there are, I think, a few reasons for the firmware not to process request.
>> I agree, that we could try not to wait for the real TX-done condition
>> at all for asynchronous mailboxes if we are not going to queue
>> requests.
>> Because it is quite clear if we already got a response, that a request
>> has been successfully reached other end,
>> but if we got a timeout error, that something bad had happened and we
>> could treat it as a global connection error, for example.
>> But, responses it is something we should handle.
>
> I think we might care when we want to change it *again* or when a user
> actually asks for the current frequency. But even this might not tell
> you the truth.
> Think of your x86 laptop: It might get boosted without the OS knowing,
> or thermal throttling might actually limit the frequency. Mine at least
> does that all of the time.

I understand that.

To be clear, I was not asking to poll the current state to see if the
CPU frequency has been *physically* changed.
I just worried that "fire-and-forget" manner wouldn't allow us to see
any responses from the other side.

Anyway, it is discussable.

>
> Cheers,
> Andre.
>
>> So, MHU as well as Rockchip and other mailbox IPs which do have
>> RX-done irq, I believe, we will be able to handle.

-- 
Regards,

Oleksandr Tyshchenko

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.