Thermal Microconference Notes
Welcome to Linux Plumbers Conference 2015
The structure will be short introductions to an issue or topic followed by a discussion with the audience.
A limit of 3 slides per presentation is enforced to ensure focus and allocate enough time for discussions.
Please use this etherpad to take notes. Microconf leaders will be giving a TWO MINUTE summary of their microconference during the Friday afternoon closing session.
Please remember there is no video this year, so your notes are the only record of your microconference.
Miniconf leaders: Please remember to take note of the approximate number of attendees in your session(s).
62 attendees - 16 active - 52 in the room at the end
SCHEDULE
Thermal subsystem introduction
Contacts:
IRC: ##thermal @ freenode
linux-pm _at vger.kernel.org
edubezval at_ gmail.com
rui.zhang at intel.com
Trip point reactive
- control long term average
Averaging sensors helps overshooting
One of the historical goals was to allow userspace control for thermal
Fundamentally disconnected from m-sensors. Has that been addressed?
There exists ability to expose thermal zones as lm-sensors
Some lm-sensors register thermal zones
Discussion with hwmon maintainer - bunch of ARM platforms with lm-sensors
iio - another framework to handle sensors like hwmon
should be architected not glued - the registration of temperature sensors with thermal zones
Thermal docbook: https://git.kernel.org/cgit/linux/kernel/git/evalenti/linux.git/log/?h=thermal-docbook
Thermal behaviour assertions
https://sinkap.github.io/bart/
https://github.com/ARM-software/trappy/
Gets ftrace output from the execution of a workload and parses it into structures that can be analyzed using python. There's a grammar with which you can specify assertions and certain behaviors that you are testing for. You can use this library to write tests and use it for regression testing. iPython notebooks can be used during development for in-depth inspection.
also: thermal-monitor from Srinivas
Intel QA use web based solution. Can be quite intrusive as it uses webserver on the device.
Steady state or can handle changing profile?
Does it work on specific thermal zone or can it be used for a combination of zones? Can be used for multiple thermal zones and with multiple pivots
How does it cope with variations in ambient condition? You'd expect controlled environment for your thermal testing.
Have you considered using fake workloads using emulated temperature? Used it but not for real modelling of temperature.
We could replay the temperature in a trace using the emulated temperature facilities. Instead of running real workloads, you could assert based on a replay of the temperature of a previous known-good run.
The tool allows asserting on temperature range for period of time.
Dependencies may be difficulties or a barries to put inside the kernel. <kernel>tools/thermal/
Generalizing to scheduler or any other data frame.
Add other events based: It needs to follow property = value format.
Sensor API
DATA on power model: ?
- Thermal sensors to devices
How to get temperature for particular sensors?
For power model, you may need to request the temperature for a specifc device.
What are the usecase?
Thermal zone for a device - phone form factor, want to control skin doesn't heat over 60C. You could have skin temperature sensor and a thermal zone. You have power models in cpu_cooling for CPUs, but also for other devices.
Power allocator governor estimates the sustainable power based on current temperature and an initial estimate. It gets the power required by the cooling devices and then allocates the sustainable power to requesting devices.
Some explanation about power allocator. based on dynamic and static power. The cooling device that request more power is more important. Also it is possible to distribute importance by means of weight.
Thermal relationship table in ACPI. Similar can be used here as well. Does better than this proposal as it also gives a weight which can be used to do linear combinations?
Device Tree representation could feed the system.
Use symlinks to represent the topology. from device to thermal zones and from thermal zone to devices.
How can this be expressed via sysfs? Links between devices and thermal zones in sysfs
PCI atom cards.
- Combining sensors
Why a thermal zone with multiple sensors cannot be multiple thermalzones?
- package and skim temperature cases
Stacking thermal zones, like what we do with the power domains.
Aggregrating. Timed avg, maximum, linear extrapolation.
Active relationship sensor (ART) and Passive relationship sensor (PRT)
_TRT (thermal relationship table)
The tables can be accessed from /dev/{nodes} created by int34 tables
Makes sense to have a separate thermal zone where a thermal zone triggers certain action.
Linux thermal sysfs enhancements
iTux (used in Android), thermald (used Linux distributions like Ubuntu), DPTF;
Extract the maximum performance.
User space intelligence.
Read temperature via sysfs attributes.
Polling at 1s! Use uevents to "push" thermal conditions to userspace: a given temperature has been exceeded.
Bottlenecks in communication with thermal?
Complexities in accesing temperature with 1s polling
uEvent adds overhead because of string parsing.
Aggregate not only temperature events but device events, like PMIC, or memory.
Proactive solution.
Firmware based solutions in Intel based platform.
Netlink got in deprecated.
sysfs v2.0 has proven to be very hard to upstream. We would have to rewrite all the existing sensors and probably userspace tools as well. Srinivas proposes to just abandon it.
Thermal to HWMON is not in use.
Suggesting to use IIO subsystem. RFC series sent to the list: http://lkml.kernel.org/r/1439855577-17114-1-git-send-email-srinivas.pandruvada@linux.intel.com
advantages of IIO: ring buffer, configuration, event based.
IIO needs a parent, in this case it would be thermal zone
A call to thermal_device_update() it would feed the ring buffer.
Down from 2 to 5% of CPU load to 0%.
Flexibility on configuring
Multiple users. How to cover for different triggers. Does IIO support multiple support?
Test sources inside IIO.
Sysfs 2.0 redesign is not really needed.
Propagating thermal constraints to the scheduler
Currently the thermal framework acts independently from the scheduler. However, some of the decisions from the cpu cooling device influence cpufreq an dultimately the scheduler.
This talk builds on top of Energy Aware Scheduling (EAS) and the power allocator governor.
Does the power model need to be in the cooling device? We remove the power model from the cooling device and have it in a central location so that it can be used both by the scheduler and thermal cooling devices.
Does the scheduler community like the idea of power models in the scheduler? We don't know yet. It will be discussed in the EAS/ microconference later today and tomorrow. Do we need a general power model infrastructure in the kerenel?
What is the power model? It is a V^2 * F * <load_factor> based power model for the dynamic power consumption. Where load_factor = \frac{time_cpu_executing_load}{total_time}. There's an additional callback function for static power (leakage). Leakeage computation needs temperature as input.
The power allocator governor determines the current power budget based on it's PID history and the available TDP threshold. This power budget is then distributed amongst the cooling devices
proportional to their power demands. The power demand is calculated as a sum of the static and dynamic power as defined above.
Requirement of the awareness of capacities in the system is not only limited to heterogeneous architectures. For example, a multi DVFS domain system requires frequency invariance in the scheduler in order to acheive the correct behaviour. Patches for the same are in the process of upstreaming and have partially been merged.
Performance estimation may not be linear with frequency. Memory may also be a impact.
What policy would be required to control the maximum frequency?
Use the thermal limit as notion of capacity.
Capacity may be represented as lack of idle time.
Might not be interesting for Multi SoC SMP systems.
Proximity to memory may be more important the the amount of Watts to light on the processors.
What are the workloads and data involved? Need to spot the benefits of closing the loop between scheduler and thermal.
The power allocator will choose a power for the cpus. Instead of turning that into a frequency clamp, communicate it to the scheduler to make a more intelligent decision.
How is the prototyping?