diff options
31 files changed, 775 insertions, 481 deletions
diff --git a/Documentation/admin-guide/pm/cpuidle.rst b/Documentation/admin-guide/pm/cpuidle.rst index 10fde58d0869..aec2cd2aaea7 100644 --- a/Documentation/admin-guide/pm/cpuidle.rst +++ b/Documentation/admin-guide/pm/cpuidle.rst @@ -347,81 +347,8 @@ for tickless systems. It follows the same basic strategy as the ``menu`` `one <menu-gov_>`_: it always tries to find the deepest idle state suitable for the given conditions. However, it applies a different approach to that problem. -First, it does not use sleep length correction factors, but instead it attempts -to correlate the observed idle duration values with the available idle states -and use that information to pick up the idle state that is most likely to -"match" the upcoming CPU idle interval. Second, it does not take the tasks -that were running on the given CPU in the past and are waiting on some I/O -operations to complete now at all (there is no guarantee that they will run on -the same CPU when they become runnable again) and the pattern detection code in -it avoids taking timer wakeups into account. It also only uses idle duration -values less than the current time till the closest timer (with the scheduler -tick excluded) for that purpose. - -Like in the ``menu`` governor `case <menu-gov_>`_, the first step is to obtain -the *sleep length*, which is the time until the closest timer event with the -assumption that the scheduler tick will be stopped (that also is the upper bound -on the time until the next CPU wakeup). That value is then used to preselect an -idle state on the basis of three metrics maintained for each idle state provided -by the ``CPUIdle`` driver: ``hits``, ``misses`` and ``early_hits``. - -The ``hits`` and ``misses`` metrics measure the likelihood that a given idle -state will "match" the observed (post-wakeup) idle duration if it "matches" the -sleep length. They both are subject to decay (after a CPU wakeup) every time -the target residency of the idle state corresponding to them is less than or -equal to the sleep length and the target residency of the next idle state is -greater than the sleep length (that is, when the idle state corresponding to -them "matches" the sleep length). The ``hits`` metric is increased if the -former condition is satisfied and the target residency of the given idle state -is less than or equal to the observed idle duration and the target residency of -the next idle state is greater than the observed idle duration at the same time -(that is, it is increased when the given idle state "matches" both the sleep -length and the observed idle duration). In turn, the ``misses`` metric is -increased when the given idle state "matches" the sleep length only and the -observed idle duration is too short for its target residency. - -The ``early_hits`` metric measures the likelihood that a given idle state will -"match" the observed (post-wakeup) idle duration if it does not "match" the -sleep length. It is subject to decay on every CPU wakeup and it is increased -when the idle state corresponding to it "matches" the observed (post-wakeup) -idle duration and the target residency of the next idle state is less than or -equal to the sleep length (i.e. the idle state "matching" the sleep length is -deeper than the given one). - -The governor walks the list of idle states provided by the ``CPUIdle`` driver -and finds the last (deepest) one with the target residency less than or equal -to the sleep length. Then, the ``hits`` and ``misses`` metrics of that idle -state are compared with each other and it is preselected if the ``hits`` one is -greater (which means that that idle state is likely to "match" the observed idle -duration after CPU wakeup). If the ``misses`` one is greater, the governor -preselects the shallower idle state with the maximum ``early_hits`` metric -(or if there are multiple shallower idle states with equal ``early_hits`` -metric which also is the maximum, the shallowest of them will be preselected). -[If there is a wakeup latency constraint coming from the `PM QoS framework -<cpu-pm-qos_>`_ which is hit before reaching the deepest idle state with the -target residency within the sleep length, the deepest idle state with the exit -latency within the constraint is preselected without consulting the ``hits``, -``misses`` and ``early_hits`` metrics.] - -Next, the governor takes several idle duration values observed most recently -into consideration and if at least a half of them are greater than or equal to -the target residency of the preselected idle state, that idle state becomes the -final candidate to ask for. Otherwise, the average of the most recent idle -duration values below the target residency of the preselected idle state is -computed and the governor walks the idle states shallower than the preselected -one and finds the deepest of them with the target residency within that average. -That idle state is then taken as the final candidate to ask for. - -Still, at this point the governor may need to refine the idle state selection if -it has not decided to `stop the scheduler tick <idle-cpus-and-tick_>`_. That -generally happens if the target residency of the idle state selected so far is -less than the tick period and the tick has not been stopped already (in a -previous iteration of the idle loop). Then, like in the ``menu`` governor -`case <menu-gov_>`_, the sleep length used in the previous computations may not -reflect the real time until the closest timer event and if it really is greater -than that time, a shallower state with a suitable target residency may need to -be selected. - +.. kernel-doc:: drivers/cpuidle/governors/teo.c + :doc: teo-description .. _idle-states-representation: diff --git a/Documentation/admin-guide/pm/intel_pstate.rst b/Documentation/admin-guide/pm/intel_pstate.rst index 7a7d4b041eac..d5043cd8d2f5 100644 --- a/Documentation/admin-guide/pm/intel_pstate.rst +++ b/Documentation/admin-guide/pm/intel_pstate.rst @@ -365,6 +365,9 @@ argument is passed to the kernel in the command line. inclusive) including both turbo and non-turbo P-states (see `Turbo P-states Support`_). + This attribute is present only if the value exposed by it is the same + for all of the CPUs in the system. + The value of this attribute is not affected by the ``no_turbo`` setting described `below <no_turbo_attr_>`_. @@ -374,6 +377,9 @@ argument is passed to the kernel in the command line. Ratio of the `turbo range <turbo_>`_ size to the size of the entire range of supported P-states, in percent. + This attribute is present only if the value exposed by it is the same + for all of the CPUs in the system. + This attribute is read-only. .. _no_turbo_attr: diff --git a/Documentation/devicetree/bindings/arm/tegra/nvidia,tegra30-actmon.txt b/Documentation/devicetree/bindings/arm/tegra/nvidia,tegra30-actmon.txt deleted file mode 100644 index 897eedfa2bc8..000000000000 --- a/Documentation/devicetree/bindings/arm/tegra/nvidia,tegra30-actmon.txt +++ /dev/null @@ -1,57 +0,0 @@ -NVIDIA Tegra Activity Monitor - -The activity monitor block collects statistics about the behaviour of other -components in the system. This information can be used to derive the rate at -which the external memory needs to be clocked in order to serve all requests -from the monitored clients. - -Required properties: -- compatible: should be "nvidia,tegra<chip>-actmon" -- reg: offset and length of the register set for the device -- interrupts: standard interrupt property -- clocks: Must contain a phandle and clock specifier pair for each entry in -clock-names. See ../../clock/clock-bindings.txt for details. -- clock-names: Must include the following entries: - - actmon - - emc -- resets: Must contain an entry for each entry in reset-names. See -../../reset/reset.txt for details. -- reset-names: Must include the following entries: - - actmon -- operating-points-v2: See ../bindings/opp/opp.txt for details. -- interconnects: Should contain entries for memory clients sitting on - MC->EMC memory interconnect path. -- interconnect-names: Should include name of the interconnect path for each - interconnect entry. Consult TRM documentation for - information about available memory clients, see MEMORY - CONTROLLER section. - -For each opp entry in 'operating-points-v2' table: -- opp-supported-hw: bitfield indicating SoC speedo ID mask -- opp-peak-kBps: peak bandwidth of the memory channel - -Example: - dfs_opp_table: opp-table { - compatible = "operating-points-v2"; - - opp@12750000 { - opp-hz = /bits/ 64 <12750000>; - opp-supported-hw = <0x000F>; - opp-peak-kBps = <51000>; - }; - ... - }; - - actmon@6000c800 { - compatible = "nvidia,tegra124-actmon"; - reg = <0x0 0x6000c800 0x0 0x400>; - interrupts = <GIC_SPI 45 IRQ_TYPE_LEVEL_HIGH>; - clocks = <&tegra_car TEGRA124_CLK_ACTMON>, - <&tegra_car TEGRA124_CLK_EMC>; - clock-names = "actmon", "emc"; - resets = <&tegra_car 119>; - reset-names = "actmon"; - operating-points-v2 = <&dfs_opp_table>; - interconnects = <&mc TEGRA124_MC_MPCORER &emc>; - interconnect-names = "cpu"; - }; diff --git a/Documentation/devicetree/bindings/devfreq/nvidia,tegra30-actmon.yaml b/Documentation/devicetree/bindings/devfreq/nvidia,tegra30-actmon.yaml new file mode 100644 index 000000000000..e3379d106728 --- /dev/null +++ b/Documentation/devicetree/bindings/devfreq/nvidia,tegra30-actmon.yaml @@ -0,0 +1,126 @@ +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/devfreq/nvidia,tegra30-actmon.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: NVIDIA Tegra30 Activity Monitor + +maintainers: + - Dmitry Osipenko <digetx@gmail.com> + - Jon Hunter <jonathanh@nvidia.com> + - Thierry Reding <thierry.reding@gmail.com> + +description: | + The activity monitor block collects statistics about the behaviour of other + components in the system. This information can be used to derive the rate at + which the external memory needs to be clocked in order to serve all requests + from the monitored clients. + +properties: + compatible: + enum: + - nvidia,tegra30-actmon + - nvidia,tegra114-actmon + - nvidia,tegra124-actmon + - nvidia,tegra210-actmon + + reg: + maxItems: 1 + + clocks: + maxItems: 2 + + clock-names: + items: + - const: actmon + - const: emc + + resets: + maxItems: 1 + + reset-names: + items: + - const: actmon + + interrupts: + maxItems: 1 + + interconnects: + minItems: 1 + maxItems: 12 + + interconnect-names: + minItems: 1 + maxItems: 12 + description: + Should include name of the interconnect path for each interconnect + entry. Consult TRM documentation for information about available + memory clients, see MEMORY CONTROLLER and ACTIVITY MONITOR sections. + + operating-points-v2: + description: + Should contain freqs and voltages and opp-supported-hw property, which + is a bitfield indicating SoC speedo ID mask. + + "#cooling-cells": + const: 2 + +required: + - compatible + - reg + - clocks + - clock-names + - resets + - reset-names + - interrupts + - interconnects + - interconnect-names + - operating-points-v2 + - "#cooling-cells" + +additionalProperties: false + +examples: + - | + #include <dt-bindings/memory/tegra30-mc.h> + + mc: memory-controller@7000f000 { + compatible = "nvidia,tegra30-mc"; + reg = <0x7000f000 0x400>; + clocks = <&clk 32>; + clock-names = "mc"; + + interrupts = <0 77 4>; + + #iommu-cells = <1>; + #reset-cells = <1>; + #interconnect-cells = <1>; + }; + + emc: external-memory-controller@7000f400 { + compatible = "nvidia,tegra30-emc"; + reg = <0x7000f400 0x400>; + interrupts = <0 78 4>; + clocks = <&clk 57>; + + nvidia,memory-controller = <&mc>; + operating-points-v2 = <&dvfs_opp_table>; + power-domains = <&domain>; + + #interconnect-cells = <0>; + }; + + actmon@6000c800 { + compatible = "nvidia,tegra30-actmon"; + reg = <0x6000c800 0x400>; + interrupts = <0 45 4>; + clocks = <&clk 119>, <&clk 57>; + clock-names = "actmon", "emc"; + resets = <&rst 119>; + reset-names = "actmon"; + operating-points-v2 = <&dvfs_opp_table>; + interconnects = <&mc TEGRA30_MC_MPCORER &emc>; + interconnect-names = "cpu-read"; + #cooling-cells = <2>; + }; diff --git a/Documentation/power/runtime_pm.rst b/Documentation/power/runtime_pm.rst index 18ae21bf7f92..d6bf84f061f4 100644 --- a/Documentation/power/runtime_pm.rst +++ b/Documentation/power/runtime_pm.rst @@ -378,7 +378,11 @@ drivers/base/power/runtime.c and include/linux/pm_runtime.h: `int pm_runtime_get_sync(struct device *dev);` - increment the device's usage counter, run pm_runtime_resume(dev) and - return its result + return its result; + note that it does not drop the device's usage counter on errors, so + consider using pm_runtime_resume_and_get() instead of it, especially + if its return value is checked by the caller, as this is likely to + result in cleaner code. `int pm_runtime_get_if_in_use(struct device *dev);` - return -EINVAL if 'power.disable_depth' is nonzero; otherwise, if the @@ -827,6 +831,15 @@ or driver about runtime power changes. Instead, the driver for the device's parent must take responsibility for telling the device's driver when the parent's power state changes. +Note that, in some cases it may not be desirable for subsystems/drivers to call +pm_runtime_no_callbacks() for their devices. This could be because a subset of +the runtime PM callbacks needs to be implemented, a platform dependent PM +domain could get attached to the device or that the device is power managed +through a supplier device link. For these reasons and to avoid boilerplate code +in subsystems/drivers, the PM core allows runtime PM callbacks to be +unassigned. More precisely, if a callback pointer is NULL, the PM core will act +as though there was a callback and it returned 0. + 9. Autosuspend, or automatically-delayed suspends ================================================= diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c index b6a782c31613..ab0b740cc0f1 100644 --- a/drivers/base/power/domain.c +++ b/drivers/base/power/domain.c @@ -379,6 +379,44 @@ err: return ret; } +static int genpd_set_performance_state(struct device *dev, unsigned int state) +{ + struct generic_pm_domain *genpd = dev_to_genpd(dev); + struct generic_pm_domain_data *gpd_data = dev_gpd_data(dev); + unsigned int prev_state; + int ret; + + prev_state = gpd_data->performance_state; + if (prev_state == state) + return 0; + + gpd_data->performance_state = state; + state = _genpd_reeval_performance_state(genpd, state); + + ret = _genpd_set_performance_state(genpd, state, 0); + if (ret) + gpd_data->performance_state = prev_state; + + return ret; +} + +static int genpd_drop_performance_state(struct device *dev) +{ + unsigned int prev_state = dev_gpd_data(dev)->performance_state; + + if (!genpd_set_performance_state(dev, 0)) + return prev_state; + + return 0; +} + +static void genpd_restore_performance_state(struct device *dev, + unsigned int state) +{ + if (state) + genpd_set_performance_state(dev, state); +} + /** * dev_pm_genpd_set_performance_state- Set performance state of device's power * domain. @@ -397,8 +435,6 @@ err: int dev_pm_genpd_set_performance_state(struct device *dev, unsigned int state) { struct generic_pm_domain *genpd; - struct generic_pm_domain_data *gpd_data; - unsigned int prev; int ret; genpd = dev_to_genpd_safe(dev); @@ -410,16 +446,7 @@ int dev_pm_genpd_set_performance_state(struct device *dev, unsigned int state) return -EINVAL; genpd_lock(genpd); - - gpd_data = to_gpd_data(dev->power.subsys_data->domain_data); - prev = gpd_data->performance_state; - gpd_data->performance_state = state; - - state = _genpd_reeval_performance_state(genpd, state); - ret = _genpd_set_performance_state(genpd, state, 0); - if (ret) - gpd_data->performance_state = prev; - + ret = genpd_set_performance_state(dev, state); genpd_unlock(genpd); return ret; @@ -572,6 +599,7 @@ static void genpd_queue_power_off_work(struct generic_pm_domain *genpd) * RPM status of the releated device is in an intermediate state, not yet turned * into RPM_SUSPENDED. This means genpd_power_off() must allow one device to not * be RPM_SUSPENDED, while it tries to power off the PM domain. + * @depth: nesting count for lockdep. * * If all of the @genpd's devices have been suspended and all of its subdomains * have been powered down, remove power from @genpd. @@ -832,7 +860,8 @@ static int genpd_runtime_suspend(struct device *dev) { struct generic_pm_domain *genpd; bool (*suspend_ok)(struct device *__dev); - struct gpd_timing_data *td = &dev_gpd_data(dev)->td; + struct generic_pm_domain_data *gpd_data = dev_gpd_data(dev); + struct gpd_timing_data *td = &gpd_data->td; bool runtime_pm = pm_runtime_enabled(dev); ktime_t time_start; s64 elapsed_ns; @@ -889,6 +918,7 @@ static int genpd_runtime_suspend(struct device *dev) return 0; genpd_lock(genpd); + gpd_data->rpm_pstate = genpd_drop_performance_state(dev); genpd_power_off(genpd, true, 0); genpd_unlock(genpd); @@ -906,7 +936,8 @@ static int genpd_runtime_suspend(struct device *dev) static int genpd_runtime_resume(struct device *dev) { struct generic_pm_domain *genpd; - struct gpd_timing_data *td = &dev_gpd_data(dev)->td; + struct generic_pm_domain_data *gpd_data = dev_gpd_data(dev); + struct gpd_timing_data *td = &gpd_data->td; bool runtime_pm = pm_runtime_enabled(dev); ktime_t time_start; s64 elapsed_ns; @@ -930,6 +961,8 @@ static int genpd_runtime_resume(struct device *dev) genpd_lock(genpd); ret = genpd_power_on(genpd, 0); + if (!ret) + genpd_restore_performance_state(dev, gpd_data->rpm_pstate); genpd_unlock(genpd); if (ret) @@ -968,6 +1001,7 @@ err_stop: err_poweroff: if (!pm_runtime_is_irq_safe(dev) || genpd_is_irq_safe(genpd)) { genpd_lock(genpd); + gpd_data->rpm_pstate = genpd_drop_performance_state(dev); genpd_power_off(genpd, true, 0); genpd_unlock(genpd); } @@ -2505,7 +2539,7 @@ EXPORT_SYMBOL_GPL(of_genpd_remove_subdomain); /** * of_genpd_remove_last - Remove the last PM domain registered for a provider - * @provider: Pointer to device structure associated with provider + * @np: Pointer to device node associated with provider * * Find the last PM domain that was added by a particular provider and * remove this PM domain from the list of PM domains. The provider is diff --git a/drivers/base/power/domain_governor.c b/drivers/base/power/domain_governor.c index c6c218758f0b..cd08c5885190 100644 --- a/drivers/base/power/domain_governor.c +++ b/drivers/base/power/domain_governor.c @@ -252,6 +252,7 @@ static bool __default_power_down_ok(struct dev_pm_domain *pd, /** * _default_power_down_ok - Default generic PM domain power off governor routine. * @pd: PM domain to check. + * @now: current ktime. * * This routine must be executed under the PM domain's lock. */ diff --git a/drivers/base/power/runtime.c b/drivers/base/power/runtime.c index b570848d23e0..8a66eaf731e4 100644 --- a/drivers/base/power/runtime.c +++ b/drivers/base/power/runtime.c @@ -345,7 +345,7 @@ static void rpm_suspend_suppliers(struct device *dev) static int __rpm_callback(int (*cb)(struct device *), struct device *dev) __releases(&dev->power.lock) __acquires(&dev->power.lock) { - int retval, idx; + int retval = 0, idx; bool use_links = dev->power.links_count > 0; if (dev->power.irq_safe) { @@ -373,7 +373,8 @@ static int __rpm_callback(int (*cb)(struct device *), struct device *dev) } } - retval = cb(dev); + if (cb) + retval = cb(dev); if (dev->power.irq_safe) { spin_lock(&dev->power.lock); @@ -446,7 +447,10 @@ static int rpm_idle(struct device *dev, int rpmflags) /* Pending requests need to be canceled. */ dev->power.request = RPM_REQ_NONE; - if (dev->power.no_callbacks) + callback = RPM_GET_CALLBACK(dev, runtime_idle); + + /* If no callback assume success. */ + if (!callback || dev->power.no_callbacks) goto out; /* Carry out an asynchronous or a synchronous idle notification. */ @@ -462,10 +466,7 @@ static int rpm_idle(struct device *dev, int rpmflags) dev->power.idle_notification = true; - callback = RPM_GET_CALLBACK(dev, runtime_idle); - - if (callback) - retval = __rpm_callback(callback, dev); + retval = __rpm_callback(callback, dev); dev->power.idle_notification = false; wake_up_all(&dev->power.wait_queue); @@ -484,9 +485,6 @@ static int rpm_callback(int (*cb)(struct device *), struct device *dev) { int retval; - if (!cb) - return -ENOSYS; - if (dev->power.memalloc_noio) { unsigned int noio_flag; diff --git a/drivers/base/power/wakeirq.c b/drivers/base/power/wakeirq.c index 8e021082dba8..3bad3266a2ad 100644 --- a/drivers/base/power/wakeirq.c +++ b/drivers/base/power/wakeirq.c @@ -182,7 +182,6 @@ int dev_pm_set_dedicated_wake_irq(struct device *dev, int irq) wirq->dev = dev; wirq->irq = irq; - irq_set_status_flags(irq, IRQ_NOAUTOEN); /* Prevent deferred spurious wakeirqs with disable_irq_nosync() */ irq_set_status_flags(irq, IRQ_DISABLE_UNLAZY); @@ -192,7 +191,8 @@ int dev_pm_set_dedicated_wake_irq(struct device *dev, int irq) * so we use a threaded irq. */ err = request_threaded_irq(irq, NULL, handle_threaded_wake_irq, - IRQF_ONESHOT, wirq->name, wirq); + IRQF_ONESHOT | IRQF_NO_AUTOEN, + wirq->name, wirq); if (err) goto err_free_name; diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index 802abc925b2a..cbab834c37a0 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -1367,9 +1367,14 @@ static int cpufreq_online(unsigned int cpu) goto out_free_policy; } + /* + * The initialization has succeeded and the policy is online. + * If there is a problem with its frequency table, take it + * offline and drop it. + */ ret = cpufreq_table_validate_and_sort(policy); if (ret) - goto out_exit_policy; + goto out_offline_policy; /* related_cpus should at least include policy->cpus. */ cpumask_copy(policy->related_cpus, policy->cpus); @@ -1515,6 +1520,10 @@ out_destroy_policy: up_write(&policy->rwsem); +out_offline_policy: + if (cpufreq_driver->offline) + cpufreq_driver->offline(policy); + out_exit_policy: if (cpufreq_driver->exit) cpufreq_driver->exit(policy); diff --git a/drivers/cpufreq/cpufreq_stats.c b/drivers/cpufreq/cpufreq_stats.c index da717f7cd9a9..1570d6f3e75d 100644 --- a/drivers/cpufreq/cpufreq_stats.c +++ b/drivers/cpufreq/cpufreq_stats.c @@ -211,7 +211,7 @@ void cpufreq_stats_free_table(struct cpufreq_policy *policy) void cpufreq_stats_create_table(struct cpufreq_policy *policy) { - unsigned int i = 0, count = 0, ret = -ENOMEM; + unsigned int i = 0, count; struct cpufreq_stats *stats; unsigned int alloc_size; struct cpufreq_frequency_table *pos; @@ -253,8 +253,7 @@ void cpufreq_stats_create_table(struct cpufreq_policy *policy) stats->last_index = freq_table_get_index(stats, policy->cur); policy->stats = stats; - ret = sysfs_create_group(&policy->kobj, &stats_attr_group); - if (!ret) + if (!sysfs_create_group(&policy->kobj, &stats_attr_group)) return; /* We failed, release resources */ diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c index 0e69dffd5a76..6012964df51b 100644 --- a/drivers/cpufreq/intel_pstate.c +++ b/drivers/cpufreq/intel_pstate.c @@ -121,9 +121,10 @@ struct sample { * @max_pstate_physical:This is physical Max P state for a processor * This can be higher than the max_pstate which can * be limited by platform thermal design power limits - * @scaling: Scaling factor to convert frequency to cpufreq - * frequency units + * @perf_ctl_scaling: PERF_CTL P-state to frequency scaling factor + * @scaling: Scaling factor between performance and frequency * @turbo_pstate: Max Turbo P state possible for this platform + * @min_freq: @min_pstate frequency in cpufreq units * @max_freq: @max_pstate frequency in cpufreq units * @turbo_freq: @turbo_pstate frequency in cpufreq units * @@ -134,8 +135,10 @@ struct pstate_data { int min_pstate; int max_pstate; int max_pstate_physical; + int perf_ctl_scaling; int scaling; int turbo_pstate; + unsigned int min_freq; unsigned int max_freq; unsigned int turbo_freq; }; @@ -366,7 +369,7 @@ static void intel_pstate_set_itmt_prio(int cpu) } } -static int intel_pstate_get_cppc_guranteed(int cpu) +static int intel_pstate_get_cppc_guaranteed(int cpu) { struct cppc_perf_caps cppc_perf; int ret; @@ -382,7 +385,7 @@ static int intel_pstate_get_cppc_guranteed(int cpu) } #else /* CONFIG_ACPI_CPPC_LIB */ -static void intel_pstate_set_itmt_prio(int cpu) +static inline void intel_pstate_set_itmt_prio(int cpu) { } #endif /* CONFIG_ACPI_CPPC_LIB */ @@ -467,6 +470,20 @@ static void intel_pstate_exit_perf_limits(struct cpufreq_policy *policy) acpi_processor_unregister_performance(policy->cpu); } + +static bool intel_pstate_cppc_perf_valid(u32 perf, struct cppc_perf_caps *caps) +{ + return perf && perf <= caps->highest_perf && perf >= caps->lowest_perf; +} + +static bool intel_pstate_cppc_perf_caps(struct cpudata *cpu, + struct cppc_perf_caps *caps) +{ + if (cppc_get_perf_caps(cpu->cpu, caps)) + return false; + + return caps->highest_perf && caps->lowest_perf <= caps->highest_perf; +} #else /* CONFIG_ACPI */ static inline void intel_pstate_init_acpi_perf_limits(struct cpufreq_policy *policy) { @@ -483,12 +500,146 @@ static inline bool intel_pstate_acpi_pm_profile_server(void) #endif /* CONFIG_ACPI */ #ifndef CONFIG_ACPI_CPPC_LIB -static int intel_pstate_get_cppc_guranteed(int cpu) +static inline int intel_pstate_get_cppc_guaranteed(int cpu) { return -ENOTSUPP; } #endif /* CONFIG_ACPI_CPPC_LIB */ +static void intel_pstate_hybrid_hwp_perf_ctl_parity(struct cpudata *cpu) +{ + pr_debug("CPU%d: Using PERF_CTL scaling for HWP\n", cpu->cpu); + + cpu->pstate.scaling = cpu->pstate.perf_ctl_scaling; +} + +/** + * intel_pstate_hybrid_hwp_calibrate - Calibrate HWP performance levels. + * @cpu: Target CPU. + * + * On hybrid processors, HWP may expose more performance levels than there are + * P-states accessible through the PERF_CTL interface. If that happens, the + * scaling factor between HWP performance levels and CPU frequency will be less + * than the scaling factor between P-state values and CPU frequency. + * + * In that case, the scaling factor between HWP performance levels and CPU + * frequency needs to be determined which can be done with the help of the + * observation that certain HWP performance levels should correspond to certain + * P-states, like for example the HWP highest performance should correspond + * to the maximum turbo P-state of the CPU. + */ +static void intel_pstate_hybrid_hwp_calibrate(struct cpudata *cpu) +{ + int perf_ctl_max_phys = cpu->pstate.max_pstate_physical; + int perf_ctl_scaling = cpu->pstate.perf_ctl_scaling; + int perf_ctl_turbo = pstate_funcs.get_turbo(); + int turbo_freq = perf_ctl_turbo * perf_ctl_scaling; + int perf_ctl_max = pstate_funcs.get_max(); + int max_freq = perf_ctl_max * perf_ctl_scaling; + int scaling = INT_MAX; + int freq; + + pr_debug("CPU%d: perf_ctl_max_phys = %d\n", cpu->cpu, perf_ctl_max_phys); + pr_debug("CPU%d: perf_ctl_max = %d\n", cpu->cpu, perf_ctl_max); + pr_debug("CPU%d: perf_ctl_turbo = %d\n", cpu->cpu, perf_ctl_turbo); + pr_debug("CPU%d: perf_ctl_scaling = %d\n", cpu->cpu, perf_ctl_scaling); + + pr_debug("CPU%d: HWP_CAP guaranteed = %d\n", cpu->cpu, cpu->pstate.max_pstate); + pr_debug("CPU%d: HWP_CAP highest = %d\n", cpu->cpu, cpu->pstate.turbo_pstate); + +#ifdef CONFIG_ACPI + if (IS_ENABLED(CONFIG_ACPI_CPPC_LIB)) { + struct cppc_perf_caps caps; + + if (intel_pstate_cppc_perf_caps(cpu, &caps)) { + if (intel_pstate_cppc_perf_valid(caps.nominal_perf, &caps)) { + pr_debug("CPU%d: Using CPPC nominal\n", cpu->cpu); + + /* + |