[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [RFC] DVFS and Thermal management subsystem proposal
# Synopsis This document is intended to describe the design of the thermal based cpu throttling in virtualized environments. The goal is to provide generic thermal management subsystem, which should work with existing cpufreq subsystem in XEN and could be used on various architectures and hardware. # Cpufreq subsystem in XEN ## Brief overview Governors +--------------------+ | +----------------+ | struct cpufreq_governor { | | ondemand | | .name | +----------------+ | .governor | +----------------+ | .handle_option | | powersave | | } | +----------------+ | | +----------------+ | +----------------------+ | | performance | |->cpufreq_register_governor() | +-------------------+| | +----------------+ | | | cpufreq_dev_drv || | +----------------+ | cpufreq_register_driver()->| +-------------------+| | | userspace | | | +-------------------+| | +----------------+ | | | ... || | +----------------+ | | +-------------------+| | | ... | | struct cpufreq_driver { +----------------------+ | +----------------+ | .init +----------------------+ +--------------------+ .verify | Hardware | .setpolicy +----------------------+ .update .target .get .getavg .exit } Cpufreq subsystem consists of 2 parts: 1) Cpufreq governor, which should be registered using cpufreq_register_governor call; 2) Cpufreq driver, which provides access to the hardware should be registered using cpufreq_register_driver call. ## Hardware drivers There are two Cpufreq hardware drivers implemented by us (see Appendix 1 and Appendix 2) to provide support for Rcar-3 and i.MX8 boards. Those drivers are designed to support thermal throttling subsystem. They are going to be the part of the contribution package. ## Configuration options Cpufreq subsystem enables with the following config param: +-----------------------------------------------------------------------------+ CONFIG_HAS_CPUFREQ=y +-----------------------------------------------------------------------------+ Cpufreq device driver is platform specific and can be selected on compile time by setting config parameter: +-----------------------------------------------------------------------------+ CONFIG_CPUFREQ_XXX +-----------------------------------------------------------------------------+ Where XXX is the platform name. Additional configuration is also possible. This could be done by device tree nodes or using ACPI configuration. Current implementation supports only device-tree configuration. Device tree configuration is defined by the cpufreq driver implementation and mostly using device-tree bindings from linux kernel. Linux kernel defines common and platform specific cpufreq bindings. See [0] /Documentation/devicetree/bindings/cpufreq and [0] /Documentation/devicetree/bindings/opp for details. Some examples can be found in Appndix 1 and Appendix 2. Cpufreq driver initializes on Xen start based on the configuration parameters. Only one cpufreq device driver could be enabled on system. Switching to the diff Cpufreq hardware driver should be probed based on Device-tree nodes or ACPI configuration. The default governor can be set from the xen-bootargs and has the following format: +-----------------------------------------------------------------------------+ cpufreq=xen:ondemand +-----------------------------------------------------------------------------+ xl.cfg (guest configuration files) support the following configuration option: guestpm. It defines PM policy for the given guest. For example: +-----------------------------------------------------------------------------+ guestpm = "0-7" +-----------------------------------------------------------------------------+ guestpm = "0-7" line allows guest to choose OPP levels from 0 to 7 out of 15. Higher OPP levels will be ignored by hypervisor. # XEN Dynamic Thermal management design ## Synopsis Introducing the design of the Dynamic Thermal Management for Xen hypervisor. This feature is an enhancement of the Xen DVFS feature and will allow system admin to configure different thermal governors which will perform CPU throttling, based on the CPU cores temperature and thermal configuration. ## Top level design. +-----------------------------------------------+ | XEN | | +-------------------+ | | | Thermal | | | +----->| Governor | | | | +---------|---------+ | | | | | | | +-------+ | | | | | | +------------------+ +------------------+ | | | Thermal | | Cpufreq | | | | Driver | | | | | +------------------+ +------------------+ | | | +-----------------------------------------------+ ^ | | +--------v--------+ | | | Hardware | | | +-----------------+ ## Thermal management subsystem design in XEN +------------------+ | +--------------+ | | | powersave | | struct thermal_governor { | +--------------+ | .name | +--------------+ | .governor | | stepwise | |<------------+ .handle_option | +--------------+ | | } | +--------------+ | | | | ... | | | | +--------------+ | | +------------------+ v +----------------->register_thermal_governor() | +---------v--------+ Polling temperature | dyn_thermal |<--------+ +--------------------+ +------------------+ +------------>| polling_handler() | +--------------------+ +-------------------------------+ register_thermal_driver() | __cpufreq_driver_target(HIGH) | + +-------------------------------+ struct thermal_driver { Set HIGH priority to the +------------------+ .name target policy. So this | thermal_driver | .get_trips configuration will override +------------------+ .get_temp cpufreq governor .set_alarm_temp +------------------+ } | thermal_sensors | +------------------+ Dynamic thermal feature consists of the 2 entities: thermal governor and driver Thermal governor should be registered using register_thermal governor and will provide the following interface: +-----------------------------------------------------------------------------+ struct thermal_governor { .name = "name" .governor = gov_dbs, .handle_option = handle_opt .temp_handler = t_handler } +-----------------------------------------------------------------------------+ Where governor should process commands (start/stop/event). Event command is needed if hw driver supports temp_alarm set. Governor is also responsible for polling temperature and do throttling by setting cpufreq policy. Cpufreq policy will be set with the priority, HIGH to override commands from cpufreq_governor. Commands from cpufreq governor should be ignored until throttling is in progress. Thermal driver should provide access to the hardware and give interface to the information. Thermal driver is responsible for the configuration and should provide this configuration to governor. We are planning to provide support of the Rcar-3 and i.MX8 boards (see Appendix 3 and Appendix 4). +-----------------------------------------------------------------------------+ thermal_driver { .name .get_trips .get_temp .set_alarm_temp } +-----------------------------------------------------------------------------+ ## Governors In Linux Kernel there is an entity called thermal governor, which is responsible for the system behaviour when critical temperatures were reached. The following governors are going to be implemented in Xen: ### Powersave governor Sets minimal cpu frequency if passive trip temperature was reached. Rebooting board on critical temperature. #### Fair-share governor Using 3 parameters to calculate throttle state: P1: max throttle state; P2: percentage[I]/100. Shows how effective device is; P3: cur_trip_level/max_no_of_trips. New cpu state of CPU = P3 * P2 * P1 #### Step-wise governor Sequentially switching state upper if temperature is rising and lower otherwise. #### User-space governor Notifies guests when trip temperature was reached by setting flag in xenhypfs. ### Thermal governor configuration Thermal governor should be enabled in Xen config paramterers: +-----------------------------------------------------------------------------+ CONFIG_HAS_THERMAL=y CONFIG_GOV_THERMAL_FAIR_SHARE=y CONFIG_GOV_THERMAL_STEP_WISE=y CONFIG_GOV_THERMAL_POWERSAVE=y CONFIG_GOV_THERMAL_USERSPACE=y +-----------------------------------------------------------------------------+ Where CONFIG_HAS_THERMAL enables Dynamic Thermal Management. Other parameters enable different thermal governors in system. The default governor is STEP_WISE or the first in list if wasn’t set in cmdline or STEP_WISE was not enabled. In current implementation, thermal driver is using device-tree nodes to probe device driver. ACPI configuration is not the part of current implementation. Thermal device driver defines the device-tree configuration format based on thermal device tree bindings from the Linux kernel source code. See [0] /Documetation/devicetree/bindings/thermal for details. Thermal governor can be configured in xen-bootargs command line by adding the following parameter: +-----------------------------------------------------------------------------+ thermal=xen:stepwise +-----------------------------------------------------------------------------+ Xenhypfs utility can be used to give the current state of the thermal: +-----------------------------------------------------------------------------+ >xenhypfs ls /thermal/ thermal_governor avail_governors Trips Throttle current_temp >xenhypfs cat /thermal/thermal_governor stepwise >xenhypfs cat /thermal/avail_governors stepwise powersave userspace >xenhypfs cat /thermal/trips/ 107(passive) 117(critical) >xenhypfs cat /thermal/throttle 0 >xenhypfs cat /therml/current_temp/0 85(cluster 0) >xenhypfs cat /therml/current_temp/1 87(cluster 1) +-----------------------------------------------------------------------------+ Thermal governor can be changed by the following command: +-----------------------------------------------------------------------------+ >xenhypfs write /thermal/thermal_governor powersave +-----------------------------------------------------------------------------+ ## Summary The proposed feature will provide smarter way to do throttling in case of thermal alarm in XEN. # Appendix 1. Rcar-3 cpufreq driver The solution for Rcar Gen3 platform consists of the following software components: • ARM Trusted Firmware, which acts as SCP module. • XEN Hypervisor, which bears set of cpufreq governors. ARM Trusted Firmware implements SCMI protocol with SMCs as the mailbox interface. ARM TF is capable of controlling performance state of both Cortex A57 cluster and Cortex A53 cluster. Active governor desides, which cluster should be altered and configure performance by setting OPP level in HW. HW driver access ATF via SCMI protocol and set's the requested performance level. # Appendix 2 i.MX8 cpufreq driver The solution for i.MX8 is similar to the Rcar-3 as it has the same components involved: * ARM Trusted Firmware, which provides SCP protocol. * XEN Hypervisor. * SC Firmware with alters performance level. i.MX8 cpufreq driver using SCFW interface to access to cpu clusters: A53 and A72. ARM TF is used to control cpu frequency of both clusters using SMC messages. SCFW interface can't be used to control cpu performance, just to get the existing performance state because of the board implementation limitations. Cpufreq device driver using device-tree bindings to receive to opp-tables configuration. See [0] /Documetation/devicetree/bindings/opp for details. # Appendix 3 Rcar-3 thermal driver The solution for Rcar Gen3 platform allows thermal subsystem to access the hardware and read sensors values. Driver is configured from the device-tree. Hardware is able to generate IRQ when critical temperature was reached. Thermal driver handles this IRQ and send event to thermal governor. Themal device driver using rcar-gen3-thermal bindings for the configuration. See [0] Documentation/devicetree/bindings/thermal/rcar-gen3-thermal.yaml for details. DOMID_XEN owner is set to the processed nodes, so those nodes shall not be passed to the guests. # Appendix 4 i.MX8 thermal driver The solution for i.MX8 board allows thermal subsystem to read thermal sensors using SCFW interface to access hardware. Current implementation follows the implementation of imx_sc_thermal in linux kernel. Alarm events are sent based on polling in the separate thread (used timer mechanism). Polling timeouts are set in the Device-tree node. i.MX8 thermal device driver using imx-thermal device tree bindings for the configuration. See [0] /Documentation/devicetree/bindings/thermal/imx-thermal.yaml for details. DOMID_XEN owner set to the processed node, so it won't be passed to the guest. # Links [0] https://elixir.bootlin.com/linux/latest/source
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |