linux.git - Clone of https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git

Age	Commit message (Collapse)	Author	Files	Lines
2025-03-28	drm/amdgpu/pm: wire up hwmon fan speed for smu 14.0.2	Alex Deucher	1	-0/+35
	commit 5ca0040ecfe8ba0dee9df1f559e8d7587f12bf89 upstream. Add callbacks for fan speed fetching. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4034 Reviewed-by: Kenneth Feng <kenneth.feng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 90df6db62fa78a8ab0b705ec38db99c7973b95d6) Cc: stable@vger.kernel.org # 6.12.x Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-03-28	drm/amdgpu/pm: Handle SCLK offset correctly in overdrive for smu 14.0.2	Tomasz Pakuła	1	-41/+18
	commit d9d4cb224e4140f51847642aa5a4a5c3eb998af0 upstream. Currently, it seems like the code was carried over from RDNA3 because it assumes two possible values to set. RDNA4, instead of having: 0: min SCLK 1: max SCLK only has 0: SCLK offset This change makes it so it only reports current offset value instead of showing possible min/max values and their indices. Moreover, it now only accepts the offset as a value, without the indice index. Additionally, the lower bound was printed as %u by mistake. Old: OD_SCLK_OFFSET: 0: -500Mhz 1: 1000Mhz OD_MCLK: 0: 97Mhz 1: 1259MHz OD_VDDGFX_OFFSET: 0mV OD_RANGE: SCLK_OFFSET: -500Mhz 1000Mhz MCLK: 97Mhz 1500Mhz VDDGFX_OFFSET: -200mv 0mv New: OD_SCLK_OFFSET: 0Mhz OD_MCLK: 0: 97Mhz 1: 1259MHz OD_VDDGFX_OFFSET: 0mV OD_RANGE: SCLK_OFFSET: -500Mhz 1000Mhz MCLK: 97Mhz 1500Mhz VDDGFX_OFFSET: -200mv 0mv Setting this offset: Old: "s 1 <offset>" New: "s <offset>" Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4036 Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Tomasz Pakuła <tomasz.pakula.oficjalny@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 1cfeb60e6e8837b1de5eb4e17df7cf31f4442144) Cc: stable@vger.kernel.org # 6.12.x Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-03-28	drm/amd/pm: add unique_id for gfx12	Harish Kasiviswanathan	1	-0/+2
	commit 19b53f96856b5316ee1fd6ca485af0889e001677 upstream. Expose unique_id for gfx12 Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 16fbc18cb07470cd33fb5f37ad181b51583e6dc0) Cc: stable@vger.kernel.org # 6.12.x Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-03-13	drm/amd/pm: always allow ih interrupt from fw	Kenneth Feng	1	-11/+1
	commit da552bda987420e877500fdd90bd0172e3bf412b upstream. always allow ih interrupt from fw on smu v14 based on the interface requirement Signed-off-by: Kenneth Feng <kenneth.feng@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit a3199eba46c54324193607d9114a1e321292d7a1) Cc: stable@vger.kernel.org # 6.12.x Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-03-07	amdgpu/pm/legacy: fix suspend/resume issues	chr[]	3	-14/+45
	commit 91dcc66b34beb72dde8412421bdc1b4cd40e4fb8 upstream. resume and irq handler happily races in set_power_state() * amdgpu_legacy_dpm_compute_clocks() needs lock * protect irq work handler * fix dpm_enabled usage v2: fix clang build, integrate Lijo's comments (Alex) Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/2524 Fixes: 3712e7a49459 ("drm/amd/pm: unified lock protections in amdgpu_dpm.c") Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Tested-by: Maciej S. Szmigiero <mail@maciej.szmigiero.name> # on Oland PRO Signed-off-by: chr[] <chris@rudorff.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit ee3dc9e204d271c9c7a8d4d38a0bce4745d33e71) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-02-21	drm/amdgpu: avoid buffer overflow attach in smu_sys_set_pp_table()	Jiang Liu	1	-1/+2
	commit 1abb2648698bf10783d2236a6b4a7ca5e8021699 upstream. It malicious user provides a small pptable through sysfs and then a bigger pptable, it may cause buffer overflow attack in function smu_sys_set_pp_table(). Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Jiang Liu <gerry@linux.alibaba.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-02-17	drm/amd/pm: Mark MM activity as unsupported	Lijo Lazar	1	-1/+0
	commit 819bf6662b93a5a8b0c396d2c7e7fab6264c9808 upstream. Aldebaran doesn't support querying MM activity percentage. Keep the field as 0xFFs to mark it as unsupported. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-02-08	drm/amdgpu: Fix potential NULL pointer dereference in ↵	Ivan Stepchenko	1	-0/+2
	atomctrl_get_smc_sclk_range_table [ Upstream commit 357445e28ff004d7f10967aa93ddb4bffa5c3688 ] The function atomctrl_get_smc_sclk_range_table() does not check the return value of smu_atom_get_data_table(). If smu_atom_get_data_table() fails to retrieve SMU_Info table, it returns NULL which is later dereferenced. Found by Linux Verification Center (linuxtesting.org) with SVACE. In practice this should never happen as this code only gets called on polaris chips and the vbios data table will always be present on those chips. Fixes: a23eefa2f461 ("drm/amd/powerplay: enable dpm for baffin.") Signed-off-by: Ivan Stepchenko <sid@itb.spb.ru> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-02-08	drm/amd/pm: Fix an error handling path in ↵	Christophe JAILLET	1	-2/+3
	vega10_enable_se_edc_force_stall_config() [ Upstream commit a3300782d5375e280ba7040f323d01960bfe3396 ] In case of error after a amdgpu_gfx_rlc_enter_safe_mode() call, it is not balanced by a corresponding amdgpu_gfx_rlc_exit_safe_mode() call. Add the missing call. Fixes: 9b7b8154cdb8 ("drm/amd/powerplay: added didt support for vega10") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-01-10	drm/amdgpu/smu13: update powersave optimizations	Alex Deucher	1	-5/+6
	Only apply when compute profile is selected. This is the only supported configuration. Selecting other profiles can lead to performane degradations. Reviewed-by: Kenneth Feng <kenneth.feng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit d477e39532d725b1cdb3c8005c689c74ffbf3b94) Cc: stable@vger.kernel.org # 6.12.x
2025-01-06	drm/amd/pm: fix BUG: scheduling while atomic	Kun Liu	4	-6/+10
	atomic scheduling will be triggered in interrupt handler for AC/DC mode switch as following backtrace. Call Trace: <IRQ> dump_stack_lvl __schedule_bug __schedule schedule schedule_preempt_disabled __mutex_lock smu_cmn_send_smc_msg_with_param smu_v13_0_irq_process amdgpu_irq_dispatch amdgpu_ih_process amdgpu_irq_handler __handle_irq_event_percpu handle_irq_event handle_edge_irq __common_interrupt common_interrupt </IRQ> <TASK> asm_common_interrupt Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Kenneth Feng <kenneth.feng@amd.com> Signed-off-by: Kun Liu <Kun.Liu2@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 03cc84b102d1a832e8dfc59344346dedcebcdf42) Cc: stable@vger.kernel.org
2024-12-18	drm/amdgpu/smu14.0.2: fix IP version check	Alex Deucher	1	-1/+1
	Use the helper function rather than reading it directly. Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 8f2cd1067afe68372a1723e05e19b68ed187676a) Cc: stable@vger.kernel.org
2024-12-05	drm/amd/pm: Set SMU v13.0.7 default workload type	Kenneth Feng	1	-0/+1
	Set the default workload type to bootup type on smu v13.0.7. This is because of the constraint on smu v13.0.7. Gfx activity has an even higher set point on 3D fullscreen mode than the one on bootup mode. This causes the 3D fullscreen mode's performance is worse than the bootup mode's performance for the lightweighted/medium workload. For the high workload, the performance is the same between 3D fullscreen mode and bootup mode. v2: set the default workload in ASIC specific file Signed-off-by: Kenneth Feng <kenneth.feng@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.11.x
2024-12-05	drm/amd/pm: Initialize power profile mode	Lijo Lazar	1	-7/+17
	Refactor such that individual SMU IP versions can choose the startup power profile mode. If no preference, then use the generic default power profile selection logic. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.11.x
2024-12-02	drm/amd/pm: fix and simplify workload handling	Alex Deucher	13	-517/+741
	smu->workload_mask is IP specific and should not be messed with in the common code. The mask bits vary across SMU versions. Move all handling of smu->workload_mask in to the backends and simplify the code. Store the user's preference in smu->power_profile_mode which will be reflected in sysfs. For internal driver profile switches for KFD or VCN, just update the workload mask so that the user's preference is retained. Remove all of the extra now unused workload related elements in the smu structure. v2: use refcounts for workload profiles v3: rework based on feedback from Lijo v4: fix the refcount on failure, drop backend mask v5: rework custom handling v6: handle failure cleanup with custom profile v7: Update documentation Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: Kenneth Feng <kenneth.feng@amd.com> Cc: Lijo Lazar <lijo.lazar@amd.com> Cc: stable@vger.kernel.org # 6.11.x
2024-12-02	Revert "drm/amd/pm: correct the workload setting"	Alex Deucher	12	-84/+36
	This reverts commit 74e1006430a5377228e49310f6d915628609929e. This causes a regression in the workload selection. A more extensive fix is being worked on. For now, revert. This came back after a merge in 6.13-rc1, so revert again. Link: https://gitlab.freedesktop.org/drm/amd/-/issues/3618 Fixes: 74e1006430a5 ("drm/amd/pm: correct the workload setting") Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 44f392fbf628a7ff2d8bb8e83ca1851261f81a6f)
2024-11-21	drm/amd/pm: Remove arcturus min power limit	Lijo Lazar	1	-1/+5
	As per power team, there is no need to impose a lower bound on arcturus power limit. Any unreasonable limit set will result in frequent throttling. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Kenneth Feng <kenneth.feng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2024-11-21	drm/amd/pm: skip setting the power source on smu v14.0.2/3	Kenneth Feng	1	-1/+0
	skip setting power source on smu v14.0.2/3 Signed-off-by: Kenneth Feng <kenneth.feng@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.11.x
2024-11-21	drm/amd/pm: disable pcie speed switching on Intel platform for smu v14.0.2/3	Kenneth Feng	1	-3/+23
	disable pcie speed switching on Intel platform for smu v14.0.2/3 based on Intel's requirement. v2: align the setting with smu v13. Signed-off-by: Kenneth Feng <kenneth.feng@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.11.x
2024-11-20	drm/amdgpu/pm: add gen5 display to the user on smu v14.0.2/3	Kenneth Feng	4	-6/+12
	add gen5 display to the user on smu v14.0.2/3 Signed-off-by: Kenneth Feng <kenneth.feng@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.11.x
2024-11-20	drm/amd/pm: remove redundant tools_size check	Bhavin Sharma	1	-13/+11
	The check for tools_size being non-zero is redundant as tools_size is explicitly set to a non-zero value (0x19000). Removing the if condition simplifies the code without altering functionality. Signed-off-by: Bhavin Sharma <bhavin.sharma@siliconsignals.io> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-20	drm/amd/pm: update current_socclk and current_uclk in gpu_metrics on smu v13.0.7	Umio Yasuno	1	-0/+2
	These were missed before. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3751 Signed-off-by: Umio Yasuno <coelacanth_dream@protonmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2024-11-20	drm/amd/pm: Get xgmi link status for XGMI_v_6_4_0	Asad Kamal	1	-1/+3
	Get XGMI_v_6_4_0 link status and populate it to metrics v1_7 for SMU_v_13_0_6 v2: Get link status register value for each soc from separate function (Lijo) Signed-off-by: Asad Kamal <asad.kamal@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-20	drm/amd/pm: Add gpu_metrics_v1_7	Asad Kamal	2	-4/+7
	Add new gpu_metrics_v1_7 to acquire xgmi link status, application counter and max vram bandwidth v2: Use gpu_metrics_v1_7 for SMU_v_13_0_6 (Lijo) Signed-off-by: Asad Kamal <asad.kamal@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-11	drm/amdgpu: Support vcn and jpeg error info parsing	Stanley.Yang	1	-0/+24
	Add vcn and jpeg error count parsing. Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-08	drm/amd/pm: print pp_dpm_mclk in ascending order on SMU v14.0.0	Tim Huang	1	-2/+3
	Currently, the pp_dpm_mclk values are reported in descending order on SMU IP v14.0.0/1/4. Adjust to ascending order for consistency with other clock interfaces. Signed-off-by: Tim Huang <tim.huang@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-05	drm/amd/pm: add zero RPM stop temperature OD setting support for SMU13	Wolfgang Müller	6	-2/+178
	Together with the feature to enable or disable zero RPM in the last commit, it also makes sense to expose the OD setting determining under which temperature the fan should stop if zero RPM is enabled. Reviewed-by: Kenneth Feng <kenneth.feng@amd.com> Signed-off-by: Wolfgang Müller <wolf@oriole.systems> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-05	drm/amd/pm: add zero RPM OD setting support for SMU13	Wolfgang Müller	6	-2/+175
	Whilst we have support for setting fan curves there is no support for disabling the zero RPM feature. Since the relevant bits are already present in the OverDriveTable, hook them up to a sysctl setting so users can influence this behaviour. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3489 Reviewed-by: Kenneth Feng <kenneth.feng@amd.com> Signed-off-by: Wolfgang Müller <wolf@oriole.systems> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-04	drm/amd/pm: correct the workload setting	Kenneth Feng	12	-36/+84
	Correct the workload setting in order not to mix the setting with the end user. Update the workload mask accordingly. v2: changes as below: 1. the end user can not erase the workload from driver except default workload. 2. always shows the real highest priority workoad to the end user. 3. the real workload mask is combined with driver workload mask and end user workload mask. v3: apply this to the other ASICs as well. v4: simplify the code v5: refine the code based on the review comments. Signed-off-by: Kenneth Feng <kenneth.feng@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-04	drm/amd/pm: always pick the pptable from IFWI	Kenneth Feng	1	-64/+1
	always pick the pptable from IFWI on smu v14.0.2/3 Signed-off-by: Kenneth Feng <kenneth.feng@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-04	drm/amd/pm: add inst to dpm_set_vcn_enable	Boyuan Zhang	13	-13/+31
	Add an instance parameter to the existing function dpm_set_vcn_enable() for future implementation. Re-write all pptable functions accordingly. v2: Remove duplicated dpm_set_vcn_enable() functions in v1. Instead, adding instance parameter to existing functions. Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com> Suggested-by: Christian König <christian.koenig@amd.com> Suggested-by: Alex Deucher <alexander.deucher@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-28	drm/amdgpu/smu13: fix profile reporting	Alex Deucher	1	-3/+3
	The following 3 commits landed in parallel: commit d7d2688bf4ea ("drm/amd/pm: update workload mask after the setting") commit 7a1613e47e65 ("drm/amdgpu/smu13: always apply the powersave optimization") commit 7c210ca5a2d7 ("drm/amdgpu: handle default profile on on devices without fullscreen 3D") While everything is set correctly, this caused the profile to be reported incorrectly because both the powersave and fullscreen3d bits were set in the mask and when the driver prints the profile, it looks for the first bit set. Fixes: d7d2688bf4ea ("drm/amd/pm: update workload mask after the setting") Reviewed-by: Kenneth Feng <kenneth.feng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-28	drm/amd/pm: Vangogh: Fix kernel memory out of bounds write	Tvrtko Ursulin	1	-1/+3
	KASAN reports that the GPU metrics table allocated in vangogh_tables_init() is not large enough for the memset done in smu_cmn_init_soft_gpu_metrics(). Condensed report follows: [ 33.861314] BUG: KASAN: slab-out-of-bounds in smu_cmn_init_soft_gpu_metrics+0x73/0x200 [amdgpu] [ 33.861799] Write of size 168 at addr ffff888129f59500 by task mangoapp/1067 ... [ 33.861808] CPU: 6 UID: 1000 PID: 1067 Comm: mangoapp Tainted: G W 6.12.0-rc4 #356 1a56f59a8b5182eeaf67eb7cb8b13594dd23b544 [ 33.861816] Tainted: [W]=WARN [ 33.861818] Hardware name: Valve Galileo/Galileo, BIOS F7G0107 12/01/2023 [ 33.861822] Call Trace: [ 33.861826] <TASK> [ 33.861829] dump_stack_lvl+0x66/0x90 [ 33.861838] print_report+0xce/0x620 [ 33.861853] kasan_report+0xda/0x110 [ 33.862794] kasan_check_range+0xfd/0x1a0 [ 33.862799] __asan_memset+0x23/0x40 [ 33.862803] smu_cmn_init_soft_gpu_metrics+0x73/0x200 [amdgpu 13b1bc364ec578808f676eba412c20eaab792779] [ 33.863306] vangogh_get_gpu_metrics_v2_4+0x123/0xad0 [amdgpu 13b1bc364ec578808f676eba412c20eaab792779] [ 33.864257] vangogh_common_get_gpu_metrics+0xb0c/0xbc0 [amdgpu 13b1bc364ec578808f676eba412c20eaab792779] [ 33.865682] amdgpu_dpm_get_gpu_metrics+0xcc/0x110 [amdgpu 13b1bc364ec578808f676eba412c20eaab792779] [ 33.866160] amdgpu_get_gpu_metrics+0x154/0x2d0 [amdgpu 13b1bc364ec578808f676eba412c20eaab792779] [ 33.867135] dev_attr_show+0x43/0xc0 [ 33.867147] sysfs_kf_seq_show+0x1f1/0x3b0 [ 33.867155] seq_read_iter+0x3f8/0x1140 [ 33.867173] vfs_read+0x76c/0xc50 [ 33.867198] ksys_read+0xfb/0x1d0 [ 33.867214] do_syscall_64+0x90/0x160 ... [ 33.867353] Allocated by task 378 on cpu 7 at 22.794876s: [ 33.867358] kasan_save_stack+0x33/0x50 [ 33.867364] kasan_save_track+0x17/0x60 [ 33.867367] __kasan_kmalloc+0x87/0x90 [ 33.867371] vangogh_init_smc_tables+0x3f9/0x840 [amdgpu] [ 33.867835] smu_sw_init+0xa32/0x1850 [amdgpu] [ 33.868299] amdgpu_device_init+0x467b/0x8d90 [amdgpu] [ 33.868733] amdgpu_driver_load_kms+0x19/0xf0 [amdgpu] [ 33.869167] amdgpu_pci_probe+0x2d6/0xcd0 [amdgpu] [ 33.869608] local_pci_probe+0xda/0x180 [ 33.869614] pci_device_probe+0x43f/0x6b0 Empirically we can confirm that the former allocates 152 bytes for the table, while the latter memsets the 168 large block. Root cause appears that when GPU metrics tables for v2_4 parts were added it was not considered to enlarge the table to fit. The fix in this patch is rather "brute force" and perhaps later should be done in a smarter way, by extracting and consolidating the part version to size logic to a common helper, instead of brute forcing the largest possible allocation. Nevertheless, for now this works and fixes the out of bounds write. v2: * Drop impossible v3_0 case. (Mario) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Fixes: 41cec40bc9ba ("drm/amd/pm: Vangogh: Add new gpu_metrics_v2_4 to acquire gpu_metrics") Cc: Mario Limonciello <mario.limonciello@amd.com> Cc: Evan Quan <evan.quan@amd.com> Cc: Wenyou Yang <WenYou.Yang@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Link: https://lore.kernel.org/r/20241025145639.19124-1-tursulin@igalia.com Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-22	drm/amdgpu: handle default profile on on devices without fullscreen 3D	Alex Deucher	1	-1/+10
	Some devices do not support fullscreen 3D. v2: Make the check generic. Fixes: 336568de918e ("drm/amdgpu/swsmu: default to fullscreen 3D profile for dGPUs") Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: Kenneth Feng <kenneth.feng@amd.com> Cc: Lijo Lazar <lijo.lazar@amd.com>
2024-10-22	drm/amdgpu: Clean the functions pointer set as NULL	Sunil Khatri	3	-6/+0
	We dont need to set the functions to NULL which arent needed as global structure members are by default set to zero or NULL for pointers. Cc: Leo Liu <leo.liu@amd.com> Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-22	drm/amdgpu: clean the dummy soft_reset functions	Sunil Khatri	3	-18/+0
	Remove the dummy soft_reset functions for all ip blocks. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-22	drm/amdgpu: clean the dummy wait_for_idle functions	Sunil Khatri	2	-13/+0
	Remove the dummy wait_for_idle functions for all ip blocks. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-22	drm/amd/pm: update deep sleep status on smu v14.0.2/3	Kenneth Feng	1	-1/+6
	disable deep sleep during the compute workload for the potential performance loss on smu v14.0.2/3 Signed-off-by: Kenneth Feng <kenneth.feng@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-22	drm/amd/pm: update overdrive function on smu v14.0.2/3	Kenneth Feng	1	-1/+1
	update overdrive function on smu v14.0.2/3 Signed-off-by: Kenneth Feng <kenneth.feng@amd.com> Acked-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-22	drm/amd/pm: update the driver-fw interface file for smu v14.0.2/3	Kenneth Feng	3	-89/+102
	update the driver-fw interface file for smu v14.0.2/3 Signed-off-by: Kenneth Feng <kenneth.feng@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-15	drm/amdgpu/swsmu: add automatic parameter to set_soft_freq_range	Alex Deucher	21	-90/+152
	On chips that support it, you can specificy 0 and 0xffff for min and max and the PMFW will use that to determine the optimal min and max. This enables optimal performance when the user manually switches between performance levels using sysfs. Previously we'd set soft min/max which could limit performance. Reviewed-by: Kenneth Feng <kenneth.feng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-15	drm/amdgpu/swsmu: default to fullscreen 3D profile for dGPUs	Alex Deucher	1	-1/+5
	This uses more aggressive hueristics than the the bootup default profile. On windows the OS has a special fullscreen 3D mode where this is used. Since we don't have the equivalent on Linux default to this profile for dGPUs. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3618 Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/1500 Link: https://gitlab.freedesktop.org/drm/amd/-/issues/3131 Fixes: c50fe289ed72 ("drm/amdgpu/swsmu: always force a state reprogram on init") Reviewed-by: Kenneth Feng <kenneth.feng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-15	drm/amdgpu/swsmu: Only force workload setup on init	Alex Deucher	1	-3/+3
	Needed to set the workload type at init time so that we can apply the navi3x margin optimization. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3618 Link: https://gitlab.freedesktop.org/drm/amd/-/issues/3131 Fixes: c50fe289ed72 ("drm/amdgpu/swsmu: always force a state reprogram on init") Reviewed-by: Kenneth Feng <kenneth.feng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-15	drm/amdgpu/smu13: always apply the powersave optimization	Alex Deucher	1	-12/+10
	It can avoid margin issues in some very demanding applications. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3618 Link: https://gitlab.freedesktop.org/drm/amd/-/issues/3131 Fixes: c50fe289ed72 ("drm/amdgpu/swsmu: always force a state reprogram on init") Reviewed-by: Kenneth Feng <kenneth.feng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-15	drm/amd/pm: Fill pcie recov cntr to metrics 1.6	Asad Kamal	1	-0/+16
	Fill pcie other end recovery counter to metrics 1.6 v2: Add separate function to check recovery counter support Signed-off-by: Asad Kamal <asad.kamal@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-15	drm/amd/pm: Update SMUv13.0.6 PMFW headers	Asad Kamal	1	-1/+4
	Update pmfw headers for smuv13.0.6 to version 0xE Signed-off-by: Asad Kamal <asad.kamal@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-07	drm/amdgpu: partially revert powerplay `__counted_by` changes	Alex Deucher	1	-13/+13
	Partially revert commit 0ca9f757a0e2 ("drm/amd/pm: powerplay: Add `__counted_by` attribute for flexible arrays") The count attribute for these arrays does not get set until after the arrays are allocated and populated leading to false UBSAN warnings. Fixes: 0ca9f757a0e2 ("drm/amd/pm: powerplay: Add `__counted_by` attribute for flexible arrays") Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3662 Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-07	drm/amd/pm: use pm_runtime_get_if_active for debugfs getters	Pierre-Eric Pelloux-Prayer	1	-69/+69
	Don't wake up the GPU for reading pm values. Instead, take a runtime powermanagement ref when trying to read it if and only if the GPU is already awake. This avoids spurious wake ups (eg: from applets). We use pm_runtime_get_if_in_active because we care about "is the GPU awake?" not about "is the GPU awake and something else prevents suspend?". Tested-by: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Kenneth Feng <kenneth.feng@amd.com> Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-07	drm/amd/pm: don't update runpm last_usage on debugfs getter	Pierre-Eric Pelloux-Prayer	1	-24/+0
	Reading pm values from the GPU shouldn't prevent it to be suspended by resetting the last active timestamp (eg: if an background app monitors GPU sensors every second, it would prevent the autosuspend sequence to trigger). Tested-by: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Kenneth Feng <kenneth.feng@amd.com> Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-07	drm/amdgpu: update the handle ptr in hw_fini	Sunil Khatri	4	-10/+9
	Update the *handle to amdgpu_ip_block ptr for all functions pointers of hw_fini. Also update the ip_block ptr where ever needed as there were cyclic dependency of hw_fini on suspend and some followed clean up. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>