diff options
101 files changed, 2835 insertions, 1746 deletions
diff --git a/Documentation/ABI/testing/sysfs-power b/Documentation/ABI/testing/sysfs-power index f523e5a3ac33..713cab1d5f12 100644 --- a/Documentation/ABI/testing/sysfs-power +++ b/Documentation/ABI/testing/sysfs-power @@ -273,3 +273,15 @@ Description: This output is useful for system wakeup diagnostics of spurious wakeup interrupts. + +What: /sys/power/pm_debug_messages +Date: July 2017 +Contact: Rafael J. Wysocki <rjw@rjwysocki.net> +Description: + The /sys/power/pm_debug_messages file controls the printing + of debug messages from the system suspend/hiberbation + infrastructure to the kernel log. + + Writing a "1" to this file enables the debug messages and + writing a "0" (default) to it disables them. Reads from + this file return the current value. diff --git a/Documentation/admin-guide/pm/cpufreq.rst b/Documentation/admin-guide/pm/cpufreq.rst index 7af83a92d2d6..47153e64dfb5 100644 --- a/Documentation/admin-guide/pm/cpufreq.rst +++ b/Documentation/admin-guide/pm/cpufreq.rst @@ -479,14 +479,6 @@ This governor exposes the following tunables: # echo `$(($(cat cpuinfo_transition_latency) * 750 / 1000)) > ondemand/sampling_rate - -``min_sampling_rate`` - The minimum value of ``sampling_rate``. - - Equal to 10000 (10 ms) if :c:macro:`CONFIG_NO_HZ_COMMON` and - :c:data:`tick_nohz_active` are both set or to 20 times the value of - :c:data:`jiffies` in microseconds otherwise. - ``up_threshold`` If the estimated CPU load is above this value (in percent), the governor will set the frequency to the maximum value allowed for the policy. diff --git a/Documentation/admin-guide/pm/index.rst b/Documentation/admin-guide/pm/index.rst index 7f148f76f432..49237ac73442 100644 --- a/Documentation/admin-guide/pm/index.rst +++ b/Documentation/admin-guide/pm/index.rst @@ -5,12 +5,6 @@ Power Management .. toctree:: :maxdepth: 2 - cpufreq - intel_pstate - -.. only:: subproject and html - - Indices - ======= - - * :ref:`genindex` + strategies + system-wide + working-state diff --git a/Documentation/admin-guide/pm/intel_pstate.rst b/Documentation/admin-guide/pm/intel_pstate.rst index 1d6249825efc..d2b6fda3d67b 100644 --- a/Documentation/admin-guide/pm/intel_pstate.rst +++ b/Documentation/admin-guide/pm/intel_pstate.rst @@ -167,35 +167,17 @@ is set. ``powersave`` ............. -Without HWP, this P-state selection algorithm generally depends on the -processor model and/or the system profile setting in the ACPI tables and there -are two variants of it. - -One of them is used with processors from the Atom line and (regardless of the -processor model) on platforms with the system profile in the ACPI tables set to -"mobile" (laptops mostly), "tablet", "appliance PC", "desktop", or -"workstation". It is also used with processors supporting the HWP feature if -that feature has not been enabled (that is, with the ``intel_pstate=no_hwp`` -argument in the kernel command line). It is similar to the algorithm +Without HWP, this P-state selection algorithm is similar to the algorithm implemented by the generic ``schedutil`` scaling governor except that the utilization metric used by it is based on numbers coming from feedback registers of the CPU. It generally selects P-states proportional to the -current CPU utilization, so it is referred to as the "proportional" algorithm. - -The second variant of the ``powersave`` P-state selection algorithm, used in all -of the other cases (generally, on processors from the Core line, so it is -referred to as the "Core" algorithm), is based on the values read from the APERF -and MPERF feedback registers and the previously requested target P-state. -It does not really take CPU utilization into account explicitly, but as a rule -it causes the CPU P-state to ramp up very quickly in response to increased -utilization which is generally desirable in server environments. - -Regardless of the variant, this algorithm is run by the driver's utilization -update callback for the given CPU when it is invoked by the CPU scheduler, but -not more often than every 10 ms (that can be tweaked via ``debugfs`` in `this -particular case <Tuning Interface in debugfs_>`_). Like in the ``performance`` -case, the hardware configuration is not touched if the new P-state turns out to -be the same as the current one. +current CPU utilization. + +This algorithm is run by the driver's utilization update callback for the +given CPU when it is invoked by the CPU scheduler, but not more often than +every 10 ms. Like in the ``performance`` case, the hardware configuration +is not touched if the new P-state turns out to be the same as the current +one. This is the default P-state selection algorithm if the :c:macro:`CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE` kernel configuration option @@ -720,34 +702,7 @@ P-state is called, the ``ftrace`` filter can be set to to gnome-shell-3409 [001] ..s. 2537.650850: intel_pstate_set_pstate <-intel_pstate_timer_func <idle>-0 [000] ..s. 2537.654843: intel_pstate_set_pstate <-intel_pstate_timer_func -Tuning Interface in ``debugfs`` -------------------------------- - -The ``powersave`` algorithm provided by ``intel_pstate`` for `the Core line of -processors in the active mode <powersave_>`_ is based on a `PID controller`_ -whose parameters were chosen to address a number of different use cases at the -same time. However, it still is possible to fine-tune it to a specific workload -and the ``debugfs`` interface under ``/sys/kernel/debug/pstate_snb/`` is -provided for this purpose. [Note that the ``pstate_snb`` directory will be -present only if the specific P-state selection algorithm matching the interface -in it actually is in use.] - -The following files present in that directory can be used to modify the PID -controller parameters at run time: - -| ``deadband`` -| ``d_gain_pct`` -| ``i_gain_pct`` -| ``p_gain_pct`` -| ``sample_rate_ms`` -| ``setpoint`` - -Note, however, that achieving desirable results this way generally requires -expert-level understanding of the power vs performance tradeoff, so extra care -is recommended when attempting to do that. - .. _LCEU2015: http://events.linuxfoundation.org/sites/events/files/slides/LinuxConEurope_2015.pdf .. _SDM: http://www.intel.com/content/www/us/en/architecture-and-technology/64-ia-32-architectures-software-developer-system-programming-manual-325384.html .. _ACPI specification: http://www.uefi.org/sites/default/files/resources/ACPI_6_1.pdf -.. _PID controller: https://en.wikipedia.org/wiki/PID_controller diff --git a/Documentation/admin-guide/pm/sleep-states.rst b/Documentation/admin-guide/pm/sleep-states.rst new file mode 100644 index 000000000000..1e5c0f00cb2f --- /dev/null +++ b/Documentation/admin-guide/pm/sleep-states.rst @@ -0,0 +1,245 @@ +=================== +System Sleep States +=================== + +:: + + Copyright (c) 2017 Intel Corp., Rafael J. Wysocki <rafael.j.wysocki@intel.com> + +Sleep states are global low-power states of the entire system in which user +space code cannot be executed and the overall system activity is significantly +reduced. + + +Sleep States That Can Be Supported +================================== + +Depending on its configuration and the capabilities of the platform it runs on, +the Linux kernel can support up to four system sleep states, includig +hibernation and up to three variants of system suspend. The sleep states that +can be supported by the kernel are listed below. + +.. _s2idle: + +Suspend-to-Idle +--------------- + +This is a generic, pure software, light-weight variant of system suspend (also +referred to as S2I or S2Idle). It allows more energy to be saved relative to +runtime idle by freezing user space, suspending the timekeeping and putting all +I/O devices into low-power states (possibly lower-power than available in the +working state), such that the processors can spend time in their deepest idle +states while the system is suspended. + +The system is woken up from this state by in-band interrupts, so theoretically +any devices that can cause interrupts to be generated in the working state can +also be set up as wakeup devices for S2Idle. + +This state can be used on platforms without support for :ref:`standby <standby>` +or :ref:`suspend-to-RAM <s2ram>`, or it can be used in addition to any of the +deeper system suspend variants to provide reduced resume latency. It is always +supported if the :c:macro:`CONFIG_SUSPEND` kernel configuration option is set. + +.. _standby: + +Standby +------- + +This state, if supported, offers moderate, but real, energy savings, while +providing a relatively straightforward transition back to the working state. No +operating state is lost (the system core logic retains power), so the system can +go back to where it left off easily enough. + +In addition to freezing user space, suspending the timekeeping and putting all +I/O devices into low-power states, which is done for :ref:`suspend-to-idle +<s2idle>` too, nonboot CPUs are taken offline and all low-level system functions +are suspended during transitions into this state. For this reason, it should +allow more energy to be saved relative to :ref:`suspend-to-idle <s2idle>`, but +the resume latency will generally be greater than for that state. + +The set of devices that can wake up the system from this state usually is +reduced relative to :ref:`suspend-to-idle <s2idle>` and it may be necessary to +rely on the platform for setting up the wakeup functionality as appropriate. + +This state is supported if the :c:macro:`CONFIG_SUSPEND` kernel configuration +option is set and the support for it is registered by the platform with the +core system suspend subsystem. On ACPI-based systems this state is mapped to +the S1 system state defined by ACPI. + +.. _s2ram: + +Suspend-to-RAM +-------------- + +This state (also referred to as STR or S2RAM), if supported, offers significant +energy savings as everything in the system is put into a low-power state, except +for memory, which should be placed into the self-refresh mode to retain its +contents. All of the steps carried out when entering :ref:`standby <standby>` +are also carried out during transitions to S2RAM. Additional operations may +take place depending on the platform capabilities. In particular, on ACPI-based +systems the kernel passes control to the platform firmware (BIOS) as the last +step during S2RAM transitions and that usually results in powering down some +more low-level components that are not directly controlled by the kernel. + +The state of devices and CPUs is saved and held in memory. All devices are +suspended and put into low-power states. In many cases, all peripheral buses +lose power when entering S2RAM, so devices must be able to handle the transition +back to the "on" state. + +On ACPI-based systems S2RAM requires some minimal boot-strapping code in the |
