summaryrefslogtreecommitdiff
path: root/drivers/cpufreq
<
AgeCommit message (Collapse)AuthorFilesLines
2021-10-06cpufreq: schedutil: Destroy mutex before kobject_put() frees the memoryJames Morse1-1/+1
[ Upstream commit cdef1196608892b9a46caa5f2b64095a7f0be60c ] Since commit e5c6b312ce3c ("cpufreq: schedutil: Use kobject release() method to free sugov_tunables") kobject_put() has kfree()d the attr_set before gov_attr_set_put() returns. kobject_put() isn't the last user of attr_set in gov_attr_set_put(), the subsequent mutex_destroy() triggers a use-after-free: | BUG: KASAN: use-after-free in mutex_is_locked+0x20/0x60 | Read of size 8 at addr ffff000800ca4250 by task cpuhp/2/20 | | CPU: 2 PID: 20 Comm: cpuhp/2 Not tainted 5.15.0-rc1 #12369 | Hardware name: ARM LTD ARM Juno Development Platform/ARM Juno Development | Platform, BIOS EDK II Jul 30 2018 | Call trace: | dump_backtrace+0x0/0x380 | show_stack+0x1c/0x30 | dump_stack_lvl+0x8c/0xb8 | print_address_description.constprop.0+0x74/0x2b8 | kasan_report+0x1f4/0x210 | kasan_check_range+0xfc/0x1a4 | __kasan_check_read+0x38/0x60 | mutex_is_locked+0x20/0x60 | mutex_destroy+0x80/0x100 | gov_attr_set_put+0xfc/0x150 | sugov_exit+0x78/0x190 | cpufreq_offline.isra.0+0x2c0/0x660 | cpuhp_cpufreq_offline+0x14/0x24 | cpuhp_invoke_callback+0x430/0x6d0 | cpuhp_thread_fun+0x1b0/0x624 | smpboot_thread_fn+0x5e0/0xa6c | kthread+0x3a0/0x450 | ret_from_fork+0x10/0x20 Swap the order of the calls. Fixes: e5c6b312ce3c ("cpufreq: schedutil: Use kobject release() method to free sugov_tunables") Cc: 4.7+ <stable@vger.kernel.org> # 4.7+ Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-09-30cpufreq: intel_pstate: Override parameters if HWP forced by BIOSDoug Smythies1-8/+14
[ Upstream commit d9a7e9df731670acdc69e81748941ad338f47fab ] If HWP has been already been enabled by BIOS, it may be necessary to override some kernel command line parameters. Once it has been enabled it requires a reset to be disabled. Suggested-by: Rafael J. Wysocki <rafael@kernel.org> Signed-off-by: Doug Smythies <dsmythies@telus.net> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-09-18cpufreq: powernv: Fix init_chip_info initialization in numa=offPratik R. Sampat1-2/+14
commit f34ee9cb2c5ac5af426fee6fa4591a34d187e696 upstream. In the numa=off kernel command-line configuration init_chip_info() loops around the number of chips and attempts to copy the cpumask of that node which is NULL for all iterations after the first chip. Hence, store the cpu mask for each chip instead of derving cpumask from node while populating the "chips" struct array and copy that to the chips[i].mask Fixes: 053819e0bf84 ("cpufreq: powernv: Handle throttling due to Pmax capping at chip level") Cc: stable@vger.kernel.org # v4.3+ Reported-by: Shirisha Ganta <shirisha.ganta1@ibm.com> Signed-off-by: Pratik R. Sampat <psampat@linux.ibm.com> Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> [mpe: Rename goto label to out_free_chip_cpu_mask] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210728120500.87549-2-psampat@linux.ibm.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-09-03cpufreq: blocklist Qualcomm sm8150 in cpufreq-dt-platdevThara Gopinath1-0/+1
[ Upstream commit 5d79e5ce5489b489cbc4c327305be9dfca0fc9ce ] The Qualcomm sm8150 platform uses the qcom-cpufreq-hw driver, so add it to the cpufreq-dt-platdev driver's blocklist. Signed-off-by: Thara Gopinath <thara.gopinath@linaro.org> Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-08-26cpufreq: armada-37xx: forbid cpufreq for 1.2 GHz variantMarek Behún1-1/+5
[ Upstream commit 484f2b7c61b9ae58cc00c5127bcbcd9177af8dfe ] The 1.2 GHz variant of the Armada 3720 SOC is unstable with DVFS: when the SOC boots, the WTMI firmware sets clocks and AVS values that work correctly with 1.2 GHz CPU frequency, but random crashes occur once cpufreq driver starts scaling. We do not know currently what is the reason: - it may be that the voltage value for L0 for 1.2 GHz variant provided by the vendor in the OTP is simply incorrect when scaling is used, - it may be that some delay is needed somewhere, - it may be something else. The most sane solution now seems to be to simply forbid the cpufreq driver on 1.2 GHz variant. Signed-off-by: Marek Behún <kabel@kernel.org> Fixes: 92ce45fb875d ("cpufreq: Add DVFS support for Armada 37xx") Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-14cpufreq: Make cpufreq_online() call driver->offline() on errorsRafael J. Wysocki1-1/+10
[ Upstream commit 3b7180573c250eb6e2a7eec54ae91f27472332ea ] In the CPU removal path the ->offline() callback provided by the driver is always invoked before ->exit(), but in the cpufreq_online() error path it is not, so ->exit() is expected to somehow know the context in which it has been called and act accordingly. That is less than straightforward, so make cpufreq_online() invoke the driver's ->offline() callback, if present, on errors before ->exit() too. This only potentially affects intel_pstate. Fixes: 91a12e91dc39 ("cpufreq: Allow light-weight tear down and bring up of CPUs") Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-19cpufreq: intel_pstate: Use HWP if enabled by platform firmwareRafael J. Wysocki1-1/+13
commit e5af36b2adb858e982d78d41d7363d05d951a19a upstream. It turns out that there are systems where HWP is enabled during initialization by the platform firmware (BIOS), but HWP EPP support is not advertised. After commit 7aa1031223bc ("cpufreq: intel_pstate: Avoid enabling HWP if EPP is not supported") intel_pstate refuses to use HWP on those systems, but the fallback PERF_CTL interface does not work on them either because of enabled HWP, and once enabled, HWP cannot be disabled. Consequently, the users of those systems cannot control CPU performance scaling. Address this issue by making intel_pstate use HWP unconditionally if it is enabled already when the driver starts. Fixes: 7aa1031223bc ("cpufreq: intel_pstate: Avoid enabling HWP if EPP is not supported") Reported-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Tested-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: 5.9+ <stable@vger.kernel.org> # 5.9+ Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-05-14cpufreq: armada-37xx: Fix determining base CPU frequencyPali Rohár1-1/+1
[ Upstream commit 8bad3bf23cbc40abe1d24cec08a114df6facf858 ] When current CPU load is not L0 then loading armada-37xx-cpufreq.ko driver fails with following error: # modprobe armada-37xx-cpufreq [ 502.702097] Unsupported CPU frequency 250 MHz This issue was partially fixed by commit 8db82563451f ("cpufreq: armada-37xx: fix frequency calculation for opp"), but only for calculating CPU frequency for opp. Fix this also for determination of base CPU frequency. Signed-off-by: Pali Rohár <pali@kernel.org> Acked-by: Gregory CLEMENT <gregory.clement@bootlin.com> Tested-by: Tomasz Maciej Nowak <tmn505@gmail.com> Tested-by: Anders Trier Olesen <anders.trier.olesen@gmail.com> Tested-by: Philip Soares <philips@netisense.com> Fixes: 92ce45fb875d ("cpufreq: Add DVFS support for Armada 37xx") Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-14cpufreq: armada-37xx: Fix driver cleanup when registration failedPali Rohár1-1/+1
[ Upstream commit 92963903a8e11b9576eb7249f8e81eefa93b6f96 ] Commit 8db82563451f ("cpufreq: armada-37xx: fix frequency calculation for opp") changed calculation of frequency passed to the dev_pm_opp_add() function call. But the code for dev_pm_opp_remove() function call was not updated, so the driver cleanup phase does not work when registration fails. This fixes the issue by using the same frequency in both calls. Signed-off-by: Pali Rohár <pali@kernel.org> Acked-by: Gregory CLEMENT <gregory.clement@bootlin.com> Tested-by: Tomasz Maciej Nowak <tmn505@gmail.com> Tested-by: Anders Trier Olesen <anders.trier.olesen@gmail.com> Tested-by: Philip Soares <philips@netisense.com> Fixes: 8db82563451f ("cpufreq: armada-37xx: fix frequency calculation for opp") Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-14cpufreq: armada-37xx: Fix the AVS value for load L1Pali Rohár1-0/+37
[ Upstream commit d118ac2062b5b8331c8768ac81e016617e0996ee ] The original CPU voltage value for load L1 is too low for Armada 37xx SoC when base CPU frequency is 1000 or 1200 MHz. It leads to instabilities where CPU gets stuck soon after dynamic voltage scaling from load L1 to L0. Update the CPU voltage value for load L1 accordingly when base frequency is 1000 or 1200 MHz. The minimal L1 value for base CPU frequency 1000 MHz is updated from the original 1.05V to 1.108V and for 1200 MHz is updated to 1.155V. This minimal L1 value is used only in the case when it is lower than value for L0. This change fixes CPU instability issues on 1 GHz and 1.2 GHz variants of Espressobin and 1 GHz Turris Mox. Marvell previously for 1 GHz variant of Espressobin provided a patch [1] suitable only for their Marvell Linux kernel 4.4 fork which workarounded this issue. Patch forced CPU voltage value to 1.108V in all loads. But such change does not fix CPU instability issues on 1.2 GHz variants of Armada 3720 SoC. During testing we come to the conclusion that using 1.108V as minimal value for L1 load makes 1 GHz variants of Espressobin and Turris Mox boards stable. And similarly 1.155V for 1.2 GHz variant of Espressobin. These two values 1.108V and 1.155V are documented in Armada 3700 Hardware Specifications as typical initial CPU voltage values. Discussion about this issue is also at the Armbian forum [2]. [1] - https://github.com/MarvellEmbeddedProcessors/linux-marvell/commit/dc33b62c90696afb6adc7dbcc4ebbd48bedec269 [2] - https://forum.armbian.com/topic/10429-how-to-make-espressobin-v7-stable/ Signed-off-by: Pali Rohár <pali@kernel.org> Acked-by: Gregory CLEMENT <gregory.clement@bootlin.com> Tested-by: Tomasz Maciej Nowak <tmn505@gmail.com> Tested-by: Anders Trier Olesen <anders.trier.olesen@gmail.com> Tested-by: Philip Soares <philips@netisense.com> Fixes: 1c3528232f4b ("cpufreq: armada-37xx: Add AVS support") Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-14cpufreq: armada-37xx: Fix setting TBG parent for load levelsMarek Behún1-12/+23
[ Upstream commit 22592df194e31baf371906cc720da38fa0ab68f5 ] With CPU frequency determining software [1] we have discovered that after this driver does one CPU frequency change, the base frequency of the CPU is set to the frequency of TBG-A-P clock, instead of the TBG that is parent to the CPU. This can be reproduced on EspressoBIN and Turris MOX: cd /sys/devices/system/cpu/cpufreq/policy0 echo powersave >scaling_governor echo performance >scaling_governor Running the mhz tool before this driver is loaded reports 1000 MHz, and after loading the driver and executing commands above the tool reports 800 MHz. The change of TBG clock selector is supposed to happen in function armada37xx_cpufreq_dvfs_setup. Before the function returns, it does this: parent = clk_get_parent(clk); clk_set_parent(clk, parent); The armada-37xx-periph clock driver has the .set_parent method implemented correctly for this, so if the method was actually called, this would work. But since the introduction of the common clock framework in commit b2476490ef11 ("clk: introduce the common clock..."), the clk_set_parent function checks whether the parent is actually changing, and if the requested new parent is same as the old parent (which is obviously the case for the code above), the .set_parent method is not called at all. This patch fixes this issue by filling the correct TBG clock selector directly in the armada37xx_cpufreq_dvfs_setup during the filling of other registers at the same address. But the determination of CPU TBG index cannot be done via the common clock framework, therefore we need to access the North Bridge Peripheral Clock registers directly in this driver. [1] https://github.com/wtarreau/mhz Signed-off-by: Marek Behún <kabel@kernel.org> Acked-by: Gregory CLEMENT <gregory.clement@bootlin.com> Tested-by: Pali Rohár <pali@kernel.org> Tested-by: Tomasz Maciej Nowak <tmn505@gmail.com> Tested-by: Anders Trier Olesen <anders.trier.olesen@gmail.com> Tested-by: Philip Soares <philips@netisense.com> Fixes: 92ce45fb875d ("cpufreq: Add DVFS support for Armada 37xx") Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-03-30cpufreq: blacklist Arm Vexpress platforms in cpufreq-dt-platdevSudeep Holla1-0/+2
[ Upstream commit fbb31cb805fd3574d3be7defc06a7fd2fd9af7d2 ] Add "arm,vexpress" to cpufreq-dt-platdev blacklist since the actual scaling is handled by the firmware cpufreq drivers(scpi, scmi and vexpress-spc). Signed-off-by: Sudeep Holla <sudeep.holla@arm.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-03-17cpufreq: qcom-hw: Fix return value check in qcom_cpufreq_hw_cpu_init()Wei Yongjun1-2/+2
[ Upstream commit 536eb97abeba857126ad055de5923fa592acef25 ] In case of error, the function ioremap() returns NULL pointer not ERR_PTR(). The IS_ERR() test in the return value check should be replaced with NULL test. Fixes: 67fc209b527d ("cpufreq: qcom-hw: drop devm_xxx() calls from init/exit hooks") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Acked-by: Shawn Guo <shawn.guo@linaro.org> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-03-17cpufreq: qcom-hw: fix dereferencing freed memory 'data'Shawn Guo1-1/+1
[ Upstream commit 02fc409540303801994d076fcdb7064bd634dbf3 ] Commit 67fc209b527d ("cpufreq: qcom-hw: drop devm_xxx() calls from init/exit hooks") introduces an issue of dereferencing freed memory 'data'. Fix it. Fixes: 67fc209b527d ("cpufreq: qcom-hw: drop devm_xxx() calls from init/exit hooks") Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Shawn Guo <shawn.guo@linaro.org> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-03-04cpufreq: intel_pstate: Get per-CPU max freq via MSR_HWP_CAPABILITIES if ↵Chen Yu1-2/+3
available commit 6f67e060083a84a4cc364eab6ae40c717165fb0c upstream. Currently, when turbo is disabled (either by BIOS or by the user), the intel_pstate driver reads the max non-turbo frequency from the package-wide MSR_PLATFORM_INFO(0xce) register. However, on asymmetric platforms it is possible in theory that small and big core with HWP enabled might have different max non-turbo CPU frequency, because MSR_HWP_CAPABILITIES is per-CPU scope according to Intel Software Developer Manual. The turbo max freq is already per-CPU in current code, so make similar change to the max non-turbo frequency as well. Reported-by: Wendy Wang <wendy.wang@intel.com> Signed-off-by: Chen Yu <yu.c.chen@intel.com> [ rjw: Subject and changelog edits ] Cc: 4.18+ <stable@vger.kernel.org> # 4.18+: a45ee4d4e13b: cpufreq: intel_pstate: Change intel_pstate_get_hwp_max() argument Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-04cpufreq: intel_pstate: Change intel_pstate_get_hwp_max() argumentRafael J. Wysocki1-8/+8
commit a45ee4d4e13b0e35a8ec7ea0bf9267243d57b302 upstream. All of the callers of intel_pstate_get_hwp_max() access the struct cpudata object that corresponds to the given CPU already and the function itself needs to access that object (in order to update hwp_cap_cached), so modify the code to pass a struct cpudata pointer to it instead of the CPU number. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: Chen Yu <yu.c.chen@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-04cpufreq: qcom-hw: drop devm_xxx() calls from init/exit hooksShawn Guo1-8/+32
commit 67fc209b527d023db4d087c68e44e9790aa089ef upstream. Commit f17b3e44320b ("cpufreq: qcom-hw: Use devm_platform_ioremap_resource() to simplify code") introduces a regression on platforms using the driver, by failing to initialise a policy, when one is created post hotplug. When all the CPUs of a policy are hoptplugged out, the call to .exit() and later to devm_iounmap() does not release the memory region that was requested during devm_platform_ioremap_resource(). Therefore, a subsequent call to .init() will result in the following error, which will prevent a new policy to be initialised: [ 3395.915416] CPU4: shutdown [ 3395.938185] psci: CPU4 killed (polled 0 ms) [ 3399.071424] CPU5: shutdown [ 3399.094316] psci: CPU5 killed (polled 0 ms) [ 3402.139358] CPU6: shutdown [ 3402.161705] psci: CPU6 killed (polled 0 ms) [ 3404.742939] CPU7: shutdown [ 3404.765592] psci: CPU7 killed (polled 0 ms) [ 3411.492274] Detected VIPT I-cache on CPU4 [ 3411.492337] GICv3: CPU4: found redistributor 400 region 0:0x0000000017ae0000 [ 3411.492448] CPU4: Booted secondary processor 0x0000000400 [0x516f802d] [ 3411.503654] qcom-cpufreq-hw 17d43000.cpufreq: can't request region for resource [mem 0x17d45800-0x17d46bff] With that being said, the original code was tricky and skipping memory region request intentionally to hide this issue. The true cause is that those devm_xxx() device managed functions shouldn't be used for cpufreq init/exit hooks, because &pdev->dev is alive across the hooks and will not trigger auto resource free-up. Let's drop the use of device managed functions and manually allocate/free resources, so that the issue can be fixed properly. Cc: v5.10+ <stable@vger.kernel.org> # v5.10+ Fixes: f17b3e44320b ("cpufreq: qcom-hw: Use devm_platform_ioremap_resource() to simplify code") Suggested-by: Bjorn Andersson <bjorn.andersson@linaro.org> Signed-off-by: Shawn Guo <shawn.guo@linaro.org> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-04cpufreq: ACPI: Set cpuinfo.max_freq directly if max boost is knownRafael J. Wysocki2-47/+23
commit 538b0188da4653b9f4511a114f014354fb6fb7a5 upstream. Commit 3c55e94c0ade ("cpufreq: ACPI: Extend frequency tables to cover boost frequencies") attempted to address a performance issue involving acpi-cpufreq, the schedutil governor and scale-invariance on x86 by extending the frequency tables created by acpi-cpufreq to cover the entire range of "turbo" (or "boost") frequencies, but that caused frequencies reported via /proc/cpuinfo and the scaling_cur_freq attribute in sysfs to change which may confuse users and monitoring tools. For this reason, revert the part of commit 3c55e94c0ade adding the extra entry to the frequency table and use the observation that in principle cpuinfo.max_freq need not be equal to the maximum frequency listed in the frequency table for the given policy. Namely, modify cpufreq_frequency_table_cpuinfo() to allow cpufreq drivers to set their own cpuinfo.max_freq above that frequency and change acpi-cpufreq to set cpuinfo.max_freq to the maximum boost frequency found via CPPC. This should be sufficient to let all of the cpufreq subsystem know the real maximum frequency of the CPU without changing frequency reporting. Link: https://bugzilla.kernel.org/show_bug.cgi?id=211305 Fixes: 3c55e94c0ade ("cpufreq: ACPI: Extend frequency tables to cover boost frequencies") Reported-by: Matt McDonald <gardotd426@gmail.com> Tested-by: Matt McDonald <gardotd426@gmail.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: Giovanni Gherdovich <ggherdovich@suse.cz> Tested-by: Michael Larabel <Michael@phoronix.com> Cc: 5.11+ <stable@vger.kernel.org> # 5.11+ Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-04cpufreq: brcmstb-avs-cpufreq: Fix resource leaks in ->remove()Christophe JAILLET1-2/+1
[ Upstream commit 3657f729b6fb5f2c0bf693742de2dcd49c572aa1 ] If 'cpufreq_unregister_driver()' fails, just WARN and continue, so that other resources are freed. Fixes: de322e085995 ("cpufreq: brcmstb-avs-cpufreq: AVS CPUfreq driver for Broadcom STB SoCs") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> [ Viresh: Updated Subject ] Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-03-04cpufreq: brcmstb-avs-cpufreq: Free resources in error pathChristophe JAILLET1-5/+16
[ Upstream commit 05f456286fd489558c72a4711d22a5612c965685 ] If 'cpufreq_register_driver()' fails, we must release the resources allocated in 'brcm_avs_prepare_init()' as already done in the remove function. To do that, introduce a new function 'brcm_avs_prepare_uninit()' in order to avoid code duplication. This also makes the code more readable (IMHO). Fixes: de322e085995 ("cpufreq: brcmstb-avs-cpufreq: AVS CPUfreq driver for Broadcom STB SoCs") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> [ Viresh: Updated Subject ] Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-02-17cpufreq: ACPI: Update arch scale-invariance max perf ratio if CPPC is not thereRafael J. Wysocki1-0/+8
commit d11a1d08a082a7dc0ada423d2b2e26e9b6f2525c upstream. If the maximum performance level taken for computing the arch_max_freq_ratio value used in the x86 scale-invariance code is higher than the one corresponding to the cpuinfo.max_freq value coming from the acpi_cpufreq driver, the scale-invariant utilization falls below 100% even if the CPU runs at cpuinfo.max_freq or slightly faster, which causes the schedutil governor to select a frequency below cpuinfo.max_freq. That frequency corresponds to a frequency table entry below the maximum performance level necessary to get to the "boost" range of CPU frequencies which prevents "boost" frequencies from being used in some workloads. While this issue is related to scale-invariance, it may be amplified by commit db865272d9c4 ("cpufreq: Avoid configuring old governors as default with intel_pstate") from the 5.10 development cycle which made it extremely easy to default to schedutil even if the preferred driver is acpi_cpufreq as long as intel_pstate is built too, because the mere presence of the latter effectively removes the ondemand governor from the defaults. Distro kernels are likely to include both intel_pstate and acpi_cpufreq on x86, so their users who cannot use intel_pstate or choose to use acpi_cpufreq may easily be affectecd by this issue. If CPPC is available, it can be used to address this issue by extending the frequency tables created by acpi_cpufreq to cover the entire available frequency range (including "boost" frequencies) for each CPU, but if CPPC is not there, acpi_cpufreq has no idea what the maximum "boost" frequency is and the frequency tables created by it cannot be extended in a meaningful way, so in that case make it ask the arch scale-invariance code to to use the "nominal" performance level for CPU utilization scaling in order to avoid the issue at hand. Fixes: db865272d9c4 ("cpufreq: Avoid configuring old governors as default with intel_pstate") Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Giovanni Gherdovich <ggherdovich@suse.cz> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-02-17cpufreq: ACPI: Extend frequency tables to cover boost frequenciesRafael J. Wysocki1-12/+95
commit 3c55e94c0adea4a5389c4b80f6ae9927dd6a4501 upstream. A severe performance regression on AMD EPYC processors when using the schedutil scaling governor was discovered by Phoronix.com and attributed to the following commits: 41ea667227ba ("x86, sched: Calculate frequency invariance for AMD systems") 976df7e5730e ("x86, sched: Use midpoint of max_boost and max_P for frequency invariance on AMD EPYC") The source of the problem is that the maximum performance level taken for computing the arch_max_freq_ratio value used in the x86 scale- invariance code is higher than the one corresponding to the cpuinfo.max_freq value coming from the acpi_cpufreq driver. This effectively causes the scale-invariant utilization to fall below 100% even if the CPU runs at cpuinfo.max_freq or slightly faster, so the schedutil governor selects a frequency below cpuinfo.max_freq then. That frequency corresponds to a frequency table entry below the maximum performance level necessary to get to the "boost" range of CPU frequencies. However, if the cpuinfo.max_freq value coming from acpi_cpufreq was higher, the schedutil governor would select higher frequencies which in turn would allow acpi_cpufreq to set more adequate performance levels and to get to the "boost" range of CPU frequencies more often. This issue affects any systems where acpi_cpufreq is used and the "boost" (or "turbo") frequencies are enabled, not just AMD EPYC. Moreover, commit db865272d9c4 ("cpufreq: Avoid configuring old governors as default with intel_pstate") from the 5.10 development cycle made it extremely easy to default to schedutil even if the preferred driver is acpi_cpufreq as long as intel_pstate is built too, because the mere presence of the latter effectively removes the ondemand governor from the defaults. Distro kernels are likely to include both intel_pstate and acpi_cpufreq on x86, so their users who cannot use intel_pstate or choose to use acpi_cpufreq may easily be affectecd by this issue. To address this issue, extend the frequency table constructed by acpi_cpufreq for each CPU to cover the entire range of available frequencies (including the "boost" ones) if CPPC is available and indicates that "boost" (or "turbo") frequencies are enabled. That causes cpuinfo.max_freq to become the maximum "boost" frequency of the given CPU (instead of the maximum frequency returned by the ACPI _PSS object that corresponds to the "nominal" performance level). Fixes: 41ea667227ba ("x86, sched: Calculate frequency invariance for AMD systems") Fixes: 976df7e5730e ("x86, sched: Use midpoint of max_boost and max_P for frequency invariance on AMD EPYC") Fixes: db865272d9c4 ("cpufreq: Avoid configuring old governors as default with intel_pstate") Link: https://www.phoronix.com/scan.php?page=article&item=linux511-amd-schedutil&num=1 Link: https://lore.kernel.org/linux-pm/20210203135321.12253-2-ggherdovich@suse.cz/ Reported-by: Michael Larabel <Michael@phoronix.com> Diagnosed-by: Giovanni Gherdovich <ggherdovich@suse.cz> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: Giovanni Gherdovich <ggherdovich@suse.cz> Reviewed-by: Giovanni Gherdovich <ggherdovich@suse.cz> Tested-by: Michael Larabel <Michael@phoronix.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-17cpufreq: powernow-k8: pass policy rather than use cpufreq_cpu_get()Colin Ian King1-6/+3
commit 943bdd0cecad06da8392a33093230e30e501eccc upstream. Currently there is an unlikely case where cpufreq_cpu_get() returns a NULL policy and this will cause a NULL pointer dereference later on. Fix this by passing the policy to transition_frequency_fidvid() from the caller and hence eliminating the need for the cpufreq_cpu_get() and cpufreq_cpu_put(). Thanks to Viresh Kumar for suggesting the fix. Addresses-Coverity: ("Dereference null return") Fixes: b43a7ffbf33b ("cpufreq: Notify all policy->cpus in cpufreq_notify_transition()") Suggested-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-12-30cpufreq: intel_pstate: Use most recent guaranteed performance valuesRafael J. Wysocki1-3/+13
commit e40ad84c26b4deeee46666492ec66b9a534b8e59 upstream. When turbo has been disabled by the BIOS, but HWP_CAP.GUARANTEED is changed later, user space may want to take advantage of this increased guaranteed performance. HWP_CAP.GUARANTEED is not a static value. It can be adjusted by an out-of-band agent or during an Intel Speed Select performance level change. The HWP_CAP.MAX is still the maximum achievable performance with turbo disabled by the BIOS, so HWP_CAP.GUARANTEED can still change as long as it remains less than or equal to HWP_CAP.MAX. When HWP_CAP.GUARANTEED is changed, the sysfs base_frequency attribute shows the most recent guaranteed frequency value. This attribute can be used by user space software to update the scaling min/max limits of the CPU. Currently, the ->setpolicy() callback already uses the latest HWP_CAP values when setting HWP_REQ, but the ->verify() callback will restrict the user settings to the to old guaranteed performance value which prevents user space from making use of the extra CPU capacity theoretically available to it after increasing HWP_CAP.GUARANTEED. To address this, read HWP_CAP in intel_pstate_verify_cpu_policy() to obtain the maximum P-state that can be used and use that to confine the policy max limit instead of using the cached and possibly stale pstate.max_freq value for this purpose. For consistency, update intel_pstate_update_perf_limits() to use the maximum available P-state returned by intel_pstate_get_hwp_max() to compute the maximum frequency instead of using the return value of intel_pstate_get_max_freq() which, again, may be stale. This issue is a side-effect of fixing the scaling frequency limits in commit eacc9c5a927e ("cpufreq: intel_pstate: Fix intel_pstate_get_hwp_max() for turbo disabled") which corrected the setting of the reduced scaling frequency values, but caused stale HWP_CAP.GUARANTEED to be used in the case at hand. Fixes: eacc9c5a927e ("cpufreq: intel_pstate: Fix intel_pstate_get_hwp_max() for turbo disabled") Reported-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Tested-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Cc: 5.8+ <stable@vger.kernel.org> # 5.8+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-12-30cpufreq: imx: fix NVMEM_IMX_OCOTP dependencyArnd Bergmann1-1/+1
[ Upstream commit fc928b901dc68481ba3e524860a641fe13e25dfe ] A driver should not 'select' drivers from another subsystem. If NVMEM is disabled, this one results in a warning: WARNING: unmet direct dependencies detected for NVMEM_IMX_OCOTP Depends on [n]: NVMEM [=n] && (ARCH_MXC [=y] || COMPILE_TEST [=y]) && HAS_IOMEM [=y] Selected by [y]: - ARM_IMX6Q_CPUFREQ [=y] && CPU_FREQ [=y] && (ARM || ARM64 [=y]) && ARCH_MXC [=y] && REGULATOR_ANATOP [=y] Change the 'select' to 'depends on' to prevent it from going wrong, and allow compile-testing without that driver, since it is only a runtime dependency. Fixes: 2782ef34ed23 ("cpufreq: imx: Select NVMEM_IMX_OCOTP") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-12-30cpufreq: vexpress-spc: Add missing MODULE_ALIASPali Rohár1-0/+1
[ Upstream commit d15183991c2d53d7cecf27a1555c91b702cef1ea ] This patch adds missing MODULE_ALIAS for automatic loading of this cpufreq driver when it is compiled as an external module. Signed-off-by: Pali Rohár <pali@kernel.org> Fixes: 47ac9aa165540 ("cpufreq: arm_big_little: add vexpress SPC interface driver") Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-12-30cpufreq: scpi: Add missing MODULE_ALIASPali Rohár1-0/+1
[ Upstream commit c0382d049d2def37b81e907a8b22661a4a4a6eb5 ] This patch adds missing MODULE_ALIAS for automatic loading of this cpufreq driver when it is compiled as an external module. Signed-off-by: Pali Rohár <pali@kernel.org> Fixes: 8def31034d033 ("cpufreq: arm_big_little: add SCPI interface driver") Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-12-30cpufreq: loongson1: Add missing MODULE_ALIASPali Rohár1-0/+1
[ Upstream commit b9acab091842ca8b288882798bb809f7abf5408a ] This patch adds missing MODULE_ALIAS for automatic loading of this cpufreq driver when it is compiled as an external module. Signed-off-by: Pali Rohár <pali@kernel.org> Fixes: a0a22cf14472f ("cpufreq: Loongson1: Add cpufreq driver for Loongson1B") Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-12-30cpufreq: sun50i: Add missing MODULE_DEVICE_TABLEPali Rohár1-0/+1
[ Upstream commit af2096f285077e3339eb835ad06c50bdd59f01b5 ] This patch adds missing MODULE_DEVICE_TABLE definition which generates correct modalias for automatic loading of this cpufreq driver when it is compiled as an external module. Signed-off-by: Pali Rohár <pali@kernel.org> Fixes: f328584f7bff8 ("cpufreq: Add sun50i nvmem based CPU scaling driver") Reviewed-by: Yangtao Li <tiny.windzz@gmail.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-12-30cpufreq: st: Add missing MODULE_DEVICE_TABLEPali Rohár1-0/+7
[ Upstream commit 183747ab52654eb406fc6b5bfb40806b75d31811 ] This patch adds missing MODULE_DEVICE_TABLE definition which generates correct modalias for automatic loading of this cpufreq driver when it is compiled as an external module. Signed-off-by: Pali Rohár <pali@kernel.org> Fixes: ab0ea257fc58d ("cpufreq: st: Provide runtime initialised driver for ST's platforms") Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-12-30cpufreq: qcom: Add missing MODULE_DEVICE_TABLEPali Rohár1-0/+1
[ Upstream commit a5a6031663bc1dd0a10babd49d1bcb3153a8327f ] This patch adds missing MODULE_DEVICE_TABLE definition which generates correct modalias for automatic loading of this cpufreq driver when it is compiled as an external module. Signed-off-by: Pali Rohár <pali@kernel.org> Fixes: 46e2856b8e188 ("cpufreq: Add Kryo CPU scaling driver") Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-12-30cpufreq: mediatek: Add missing MODULE_DEVICE_TABLEPali Rohár1-0/+1
[ Upstream commit af6eca06501118af3e2ad46eee8edab20624b74e ] This patch adds missing MODULE_DEVICE_TABLE definition which generates correct modalias for automatic loading of this cpufreq driver when it is compiled as an external module. Signed-off-by: Pali Rohár <pali@kernel.org> Fixes: 501c574f4e3a5 ("cpufreq: mediatek: Add support of cpufreq to MT2701/MT7623 SoC") Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-12-30cpufreq: highbank: Add missing MODULE_DEVICE_TABLEPali Rohár1-0/+7
[ Upstream commit 9433777a6e0aae27468d3434b75cd51bb88ff711 ] This patch adds missing MODULE_DEVICE_TABLE definition which generates correct modalias for automatic loading of this cpufreq driver when it is compiled as an external module. Signed-off-by: Pali Rohár <pali@kernel.org> Fixes: 6754f556103be ("cpufreq / highbank: add support for highbank cpufreq") Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-12-30cpufreq: ap806: Add missing MODULE_DEVICE_TABLEPali Rohár1-0/+6
[ Upstream commit 925a5bcefe105f2790ecbdc252eb2315573f309d ] This patch adds missing MODULE_DEVICE_TABLE definition which generates correct modalias for automatic loading of this cpufreq driver when it is compiled as an external module. Signed-off-by: Pali Rohár <pali@kernel.org> Fixes: f525a670533d9 ("cpufreq: ap806: add cpufreq driver for Armada 8K") Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-11-23Merge branch 'cpufreq/arm/fixes' of ↵Rafael J. Wysocki1-1/+3
git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm Pull SCMI cpufreq driver fix for 5.10-rc6 from Viresh Kumar: "This fixes a build issues with SCMI cpufreq driver in the !CONFIG_COMMON_CLK case." * 'cpufreq/arm/fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm: cpufreq: scmi: Fix build for !CONFIG_COMMON_CLK
2020-11-23cpufreq: scmi: Fix build for !CONFIG_COMMON_CLKSudeep Holla1-1/+3
Commit 8410e7f3b31e ("cpufreq: scmi: Fix OPP addition failure with a dummy clock provider") registers a dummy clock provider using devm_of_clk_add_hw_provider. These *_hw_provider functions are defined only when CONFIG_COMMON_CLK=y. One possible fix is to add the Kconfig dependency, but since we plan to move away from the clock dependency for scmi cpufreq, it is preferrable to avoid that. Let us just conditionally compile out the offending call to devm_of_clk_add_hw_provider. It also uses the variable 'dev' outside of the #ifdef block to avoid build warning. Fixes: 8410e7f3b31e ("cpufreq: scmi: Fix OPP addition failure with a dummy clock provider") Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Sudeep Holla <sudeep.holla@arm.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2020-11-17Merge branch 'cpufreq/arm/fixes' of ↵Rafael J. Wysocki2-12/+27
git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm Pull cpufreq-arm fixes for 5.10-rc5 from Viresh Kumar: "- tegra186: Fix ->get() callback. - arm/scmi: Add dummy clock provider to fix failure." * 'cpufreq/arm/fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm: cpufreq: scmi: Fix OPP addition failure with a dummy clock provider cpufreq: tegra186: Fix get frequency callback
2020-11-17cpufreq: scmi: Fix OPP addition failure with a dummy clock providerSudeep Holla1-0/+6
Commit dd461cd9183f ("opp: Allow dev_pm_opp_get_opp_table() to return -EPROBE_DEFER") handles -EPROBE_DEFER for the clock/interconnects within _allocate_opp_table() which is called from dev_pm_opp_add and it now propagates the error back to the caller. SCMI performance domain re-used clock bindings to keep it simple. However with the above mentioned change, if clock property is present in a device node, opps fails to get added with below errors until clk_get succeeds. cpu0: failed to add opp 450000000Hz cpu0: failed to add opps to the device ....(errors on cpu1-cpu4) cpu5: failed to add opp 450000000Hz cpu5: failed to add opps to the device So, in order to fix the issue, we need to register dummy clock provider. With the dummy clock provider, clk_get returns NULL(no errors!), then opp core proceeds to add OPPs for the CPUs. Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Fixes: dd461cd9183f ("opp: Allow dev_pm_opp_get_opp_table() to return -EPROBE_DEFER") Signed-off-by: Sudeep Holla <sudeep.holla@arm.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2020-11-17cpufreq: tegra186: Fix get frequency callbackJon Hunter1-12/+21
Commit b89c01c96051 ("cpufreq: tegra186: Fix initial frequency") implemented the CPUFREQ 'get' callback to determine the current operating frequency for each CPU. This implementation used a simple looked up to determine the current operating frequency. The problem with this is that frequency table for different Tegra186 devices may vary and so the default boot frequency for Tegra186 device may or may not be present in the frequency table. If the default boot frequency is not present in the frequency table, this causes the function tegra186_cpufreq_get() to return 0 and in turn causes cpufreq_online() to fail which prevents CPUFREQ from working. Fix this by always calculating the CPU frequency based upon the current 'ndiv' setting for the CPU. Note that the CPU frequency for Tegra186 is calculated by reading the current 'ndiv' setting, multiplying by the CPU reference clock and dividing by a constant divisor. Fixes: b89c01c96051 ("cpufreq: tegra186: Fix initial frequency") Signed-off-by: Jon Hunter <jonathanh@nvidia.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2020-11-10cpufreq: intel_pstate: Take CPUFREQ_GOV_STRICT_TARGET into accountRafael J. Wysocki1-7/+9
Make intel_pstate take the new CPUFREQ_GOV_STRICT_TARGET governor flag into account when it operates in the passive mode with HWP enabled, so as to fix the "powersave" governor behavior in that case (currently, HWP is allowed to scale the performance all the way up to the policy max limit when the "powersave" governor is used, but it should be constrained to the policy min limit then). Fixes: f6ebbcf08f37 ("cpufreq: intel_pstate: Implement passive mode with HWP enabled") Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Cc: 5.9+ <stable@vger.kernel.org> # 5.9+: 9a2a9ebc0a75 cpufreq: Introduce governor flags Cc: 5.9+ <stable@vger.kernel.org> # 5.9+: 218f66870181 cpufreq: Introduce CPUFREQ_GOV_STRICT_TARGET Cc: 5.9+ <stable@vger.kernel.org> # 5.9+: ea9364bbadf1 cpufreq: Add strict_target to struct cpufreq_policy
2020-11-10cpufreq: Add strict_target to struct cpufreq_policyRafael J. Wysocki1-0/+2
Add a new field to be set when the CPUFREQ_GOV_STRICT_TARGET flag is set for the current governor to struct cpufreq_policy, so that the drivers needing to check CPUFREQ_GOV_STRICT_TARGET do not have to access the governor object during every frequency transition. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2020-11-10cpufreq: Introduce CPUFREQ_GOV_STRICT_TARGETRafael J. Wysocki2-0/+2
Introduce a new governor flag, CPUFREQ_GOV_STRICT_TARGET, for the governors that want the target frequency to be set exactly to the given value without leaving any room for adjustments on the hardware side and set this flag for the powersave and performance governors. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2020-11-10cpufreq: Introduce governor flagsRafael J. Wysocki2-2/+2
A new cpufreq governor flag will be added subsequently, so replace the bool dynamic_switching fleid in struct cpufreq_governor with a flags field and introduce CPUFREQ_GOV_DYNAMIC_SWITCHING to set for the "dynamic switching" governors instead of it. No intentional functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2020-10-29cpufreq: Introduce cpufreq_driver_test_flags()Rafael J. Wysocki1-0/+12
Add a helper function to test the flags of the cpufreq driver in use againt a given flags mask. In particular, this will be needed to test the CPUFREQ_NEED_UPDATE_LIMITS cpufreq driver flag in the schedutil governor. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>