summaryrefslogtreecommitdiff
path: root/drivers/gpu
AgeCommit message (Collapse)AuthorFilesLines
3 daysdrm/amd/pm: fix amdgpu_irq enabled counter unbalanced on smu v11.0Yang Wang2-3/+11
commit e12603bf2c3d571476a21debfeab80bb70d8c0cc upstream. v1: - fix amdgpu_irq enabled counter unbalanced issue on smu_v11_0_disable_thermal_alert. v2: - re-enable smu thermal alert to make amdgpu irq counter balance for smu v11.0 if in runpm state [75582.361561] ------------[ cut here ]------------ [75582.361565] WARNING: CPU: 42 PID: 533 at drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c:639 amdgpu_irq_put+0xd8/0xf0 [amdgpu] ... [75582.362211] Tainted: [E]=UNSIGNED_MODULE [75582.362214] Hardware name: GIGABYTE MZ01-CE0-00/MZ01-CE0-00, BIOS F14a 08/14/2020 [75582.362218] Workqueue: pm pm_runtime_work [75582.362225] RIP: 0010:amdgpu_irq_put+0xd8/0xf0 [amdgpu] [75582.362556] Code: 31 f6 31 ff e9 c9 bf cf c2 44 89 f2 4c 89 e6 4c 89 ef e8 db fc ff ff 5b 41 5c 41 5d 41 5e 5d 31 d2 31 f6 31 ff e9 a8 bf cf c2 <0f> 0b eb c3 b8 fe ff ff ff eb 97 e9 84 e8 8b 00 0f 1f 84 00 00 00 [75582.362560] RSP: 0018:ffffd50d51297b80 EFLAGS: 00010246 [75582.362564] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000 [75582.362568] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 [75582.362570] RBP: ffffd50d51297ba0 R08: 0000000000000000 R09: 0000000000000000 [75582.362573] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8e72091d2008 [75582.362576] R13: ffff8e720af80000 R14: 0000000000000000 R15: ffff8e720af80000 [75582.362579] FS: 0000000000000000(0000) GS:ffff8e9158262000(0000) knlGS:0000000000000000 [75582.362582] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [75582.362585] CR2: 000074869d040c14 CR3: 0000001e37a3e000 CR4: 00000000003506f0 [75582.362588] Call Trace: [75582.362591] <TASK> [75582.362597] smu_v11_0_disable_thermal_alert+0x17/0x30 [amdgpu] [75582.362983] smu_smc_hw_cleanup+0x79/0x4f0 [amdgpu] [75582.363375] smu_suspend+0x92/0x110 [amdgpu] [75582.363762] ? gfx_v10_0_hw_fini+0xd5/0x150 [amdgpu] [75582.364098] amdgpu_ip_block_suspend+0x27/0x80 [amdgpu] [75582.364377] ? timer_delete_sync+0x10/0x20 [75582.364384] amdgpu_device_ip_suspend_phase2+0x190/0x450 [amdgpu] [75582.364665] amdgpu_device_suspend+0x1ae/0x2f0 [amdgpu] [75582.364948] amdgpu_pmops_runtime_suspend+0xf3/0x1f0 [amdgpu] [75582.365230] pci_pm_runtime_suspend+0x6d/0x1f0 [75582.365237] ? __pfx_pci_pm_runtime_suspend+0x10/0x10 [75582.365242] __rpm_callback+0x4c/0x190 [75582.365246] ? srso_return_thunk+0x5/0x5f [75582.365252] ? srso_return_thunk+0x5/0x5f [75582.365256] ? ktime_get_mono_fast_ns+0x43/0xe0 [75582.365263] rpm_callback+0x6e/0x80 [75582.365267] rpm_suspend+0x124/0x5f0 [75582.365271] ? srso_return_thunk+0x5/0x5f [75582.365275] ? __schedule+0x439/0x15e0 [75582.365281] ? srso_return_thunk+0x5/0x5f [75582.365285] ? __queue_delayed_work+0xb8/0x180 [75582.365293] pm_runtime_work+0xc6/0xe0 [75582.365297] process_one_work+0x1a1/0x3f0 [75582.365303] worker_thread+0x2ba/0x3d0 [75582.365309] kthread+0x107/0x220 [75582.365313] ? __pfx_worker_thread+0x10/0x10 [75582.365318] ? __pfx_kthread+0x10/0x10 [75582.365323] ret_from_fork+0xa2/0x120 [75582.365328] ? __pfx_kthread+0x10/0x10 [75582.365332] ret_from_fork_asm+0x1a/0x30 [75582.365343] </TASK> [75582.365345] ---[ end trace 0000000000000000 ]--- [75582.365350] amdgpu 0000:05:00.0: amdgpu: Fail to disable thermal alert! [75582.365379] amdgpu 0000:05:00.0: amdgpu: suspend of IP block <smu> failed -22 Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 daysdrm/amd/pm: Return -EOPNOTSUPP for unsupported OD_MCLK on smu_v13_0_6Asad Kamal1-1/+1
commit 2f0e491faee43181b6a86e90f34016b256042fe1 upstream. When SET_UCLK_MAX capability is absent, return -EOPNOTSUPP from smu_v13_0_6_emit_clk_levels() for OD_MCLK instead of 0. This makes unsupported OD_MCLK reporting consistent with other clock types and allows callers to skip the entry cleanly. Signed-off-by: Asad Kamal <asad.kamal@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit d82e0a72d9189e8acd353988e1a57f85ce479e37) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 daysdrm/i915: Unlink NV12 planes earlierVille Syrjälä1-2/+9
commit bfa71b7a9dc6b5b8af157686e03308291141d00c upstream. unlink_nv12_plane() will clobber parts of the plane state potentially already set up by plane_atomic_check(), so we must make sure not to call the two in the wrong order. The problem happens when a plane previously selected as a Y plane is now configured as a normal plane by user space. plane_atomic_check() will first compute the proper plane state based on the userspace request, and unlink_nv12_plane() later clears some of the state. This used to work on account of unlink_nv12_plane() skipping the state clearing based on the plane visibility. But I removed that check, thinking it was an impossible situation. Now when that situation happens unlink_nv12_plane() will just WARN and proceed to clobber the state. Rather than reverting to the old way of doing things, I think it's more clear if we unlink the NV12 planes before we even compute the new plane state. Cc: stable@vger.kernel.org Reported-by: Khaled Almahallawy <khaled.almahallawy@intel.com> Closes: https://lore.kernel.org/intel-gfx/20260212004852.1920270-1-khaled.almahallawy@intel.com/ Tested-by: Khaled Almahallawy <khaled.almahallawy@intel.com> Fixes: 6a01df2f1b2a ("drm/i915: Remove pointless visible check in unlink_nv12_plane()") Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patch.msgid.link/20260316163953.12905-2-ville.syrjala@linux.intel.com Reviewed-by: Uma Shankar <uma.shankar@intel.com> (cherry picked from commit 017ecd04985573eeeb0745fa2c23896fb22ee0cc) Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 daysdrm/i915: Order OP vs. timeout correctly in __wait_for()Ville Syrjälä1-1/+1
commit 6ad2a661ff0d3d94884947d2a593311ba46d34c2 upstream. Put the barrier() before the OP so that anything we read out in OP and check in COND will actually be read out after the timeout has been evaluated. Currently the only place where we use OP is __intel_wait_for_register(), but the use there is precisely susceptible to this reordering, assuming the ktime_*() stuff itself doesn't act as a sufficient barrier: __intel_wait_for_register(...) { ... ret = __wait_for(reg_value = intel_uncore_read_notrace(...), (reg_value & mask) == value, ...); ... } Cc: stable@vger.kernel.org Fixes: 1c3c1dc66a96 ("drm/i915: Add compiler barrier to wait_for") Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patch.msgid.link/20260313110740.24620-1-ville.syrjala@linux.intel.com Reviewed-by: Jani Nikula <jani.nikula@intel.com> (cherry picked from commit a464bace0482aa9a83e9aa7beefbaf44cd58e6cf) Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 daysdrm/i915/dp_tunnel: Fix error handling when clearing stream BW in atomic stateImre Deak3-11/+28
commit 77fcf58df15edcf3f5b5421f24814fb72796def9 upstream. Clearing the DP tunnel stream BW in the atomic state involves getting the tunnel group state, which can fail. Handle the error accordingly. This fixes at least one issue where drm_dp_tunnel_atomic_set_stream_bw() failed to get the tunnel group state returning -EDEADLK, which wasn't handled. This lead to the ctx->contended warn later in modeset_lock() while taking a WW mutex for another object in the same atomic state, and thus within the same already contended WW context. Moving intel_crtc_state_alloc() later would avoid freeing saved_state on the error path; this stable patch leaves that simplification for a follow-up. Cc: Uma Shankar <uma.shankar@intel.com> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: <stable@vger.kernel.org> # v6.9+ Fixes: a4efae87ecb2 ("drm/i915/dp: Compute DP tunnel BW during encoder state computation") Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/7617 Reviewed-by: Michał Grzelak <michal.grzelak@intel.com> Reviewed-by: Uma Shankar <uma.shankar@intel.com> Signed-off-by: Imre Deak <imre.deak@intel.com> Link: https://patch.msgid.link/20260320092900.13210-1-imre.deak@intel.com (cherry picked from commit fb69d0076e687421188bc8103ab0e8e5825b1df1) Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 daysdrm/amd/display: Fix drm_edid leak in amdgpu_dmAlex Hung1-1/+2
commit 37c2caa167b0b8aca4f74c32404c5288b876a2a3 upstream. [WHAT] When a sink is connected, aconnector->drm_edid was overwritten without freeing the previous allocation, causing a memory leak on resume. [HOW] Free the previous drm_edid before updating it. Reviewed-by: Roman Li <roman.li@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Chuanyu Tseng <chuanyu.tseng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 52024a94e7111366141cfc5d888b2ef011f879e5) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 daysdrm/amdgpu: prevent immediate PASID reuse caseEric Huang3-13/+34
commit 14b81abe7bdc25f8097906fc2f91276ffedb2d26 upstream. PASID resue could cause interrupt issue when process immediately runs into hw state left by previous process exited with the same PASID, it's possible that page faults are still pending in the IH ring buffer when the process exits and frees up its PASID. To prevent the case, it uses idr cyclic allocator same as kernel pid's. Signed-off-by: Eric Huang <jinhuieric.huang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 8f1de51f49be692de137c8525106e0fce2d1912d) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 daysdrm/xe: always keep track of remap prev/nextMatthew Auld3-10/+28
commit bfe9e314d7574d1c5c851972e7aee342733819d2 upstream. During 3D workload, user is reporting hitting: [ 413.361679] WARNING: drivers/gpu/drm/xe/xe_vm.c:1217 at vm_bind_ioctl_ops_unwind+0x1e2/0x2e0 [xe], CPU#7: vkd3d_queue/9925 [ 413.361944] CPU: 7 UID: 1000 PID: 9925 Comm: vkd3d_queue Kdump: loaded Not tainted 7.0.0-070000rc3-generic #202603090038 PREEMPT(lazy) [ 413.361949] RIP: 0010:vm_bind_ioctl_ops_unwind+0x1e2/0x2e0 [xe] [ 413.362074] RSP: 0018:ffffd4c25c3df930 EFLAGS: 00010282 [ 413.362077] RAX: 0000000000000000 RBX: ffff8f3ee817ed10 RCX: 0000000000000000 [ 413.362078] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 [ 413.362079] RBP: ffffd4c25c3df980 R08: 0000000000000000 R09: 0000000000000000 [ 413.362081] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8f41fbf99380 [ 413.362082] R13: ffff8f3ee817e968 R14: 00000000ffffffef R15: ffff8f43d00bd380 [ 413.362083] FS: 00000001040ff6c0(0000) GS:ffff8f4696d89000(0000) knlGS:00000000330b0000 [ 413.362085] CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033 [ 413.362086] CR2: 00007ddfc4747000 CR3: 00000002e6262005 CR4: 0000000000f72ef0 [ 413.362088] PKRU: 55555554 [ 413.362089] Call Trace: [ 413.362092] <TASK> [ 413.362096] xe_vm_bind_ioctl+0xa9a/0xc60 [xe] Which seems to hint that the vma we are re-inserting for the ops unwind is either invalid or overlapping with something already inserted in the vm. It shouldn't be invalid since this is a re-insertion, so must have worked before. Leaving the likely culprit as something already placed where we want to insert the vma. Following from that, for the case where we do something like a rebind in the middle of a vma, and one or both mapped ends are already compatible, we skip doing the rebind of those vma and set next/prev to NULL. As well as then adjust the original unmap va range, to avoid unmapping the ends. However, if we trigger the unwind path, we end up with three va, with the two ends never being removed and the original va range in the middle still being the shrunken size. If this occurs, one failure mode is when another unwind op needs to interact with that range, which can happen with a vector of binds. For example, if we need to re-insert something in place of the original va. In this case the va is still the shrunken version, so when removing it and then doing a re-insert it can overlap with the ends, which were never removed, triggering a warning like above, plus leaving the vm in a bad state. With that, we need two things here: 1) Stop nuking the prev/next tracking for the skip cases. Instead relying on checking for skip prev/next, where needed. That way on the unwind path, we now correctly remove both ends. 2) Undo the unmap va shrinkage, on the unwind path. With the two ends now removed the unmap va should expand back to the original size again, before re-insertion. v2: - Update the explanation in the commit message, based on an actual IGT of triggering this issue, rather than conjecture. - Also undo the unmap shrinkage, for the skip case. With the two ends now removed, the original unmap va range should expand back to the original range. v3: - Track the old start/range separately. vma_size/start() uses the va info directly. Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/7602 Fixes: 8f33b4f054fc ("drm/xe: Avoid doing rebinds") Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: <stable@vger.kernel.org> # v6.8+ Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20260318100208.78097-2-matthew.auld@intel.com (cherry picked from commit aec6969f75afbf4e01fd5fb5850ed3e9c27043ac) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 daysdrm/amdgpu: Fix fence put before wait in amdgpu_amdkfd_submit_ibSrinivasan Shanmugam1-2/+2
[ Upstream commit 7150850146ebfa4ca998f653f264b8df6f7f85be ] amdgpu_amdkfd_submit_ib() submits a GPU job and gets a fence from amdgpu_ib_schedule(). This fence is used to wait for job completion. Currently, the code drops the fence reference using dma_fence_put() before calling dma_fence_wait(). If dma_fence_put() releases the last reference, the fence may be freed before dma_fence_wait() is called. This can lead to a use-after-free. Fix this by waiting on the fence first and releasing the reference only after dma_fence_wait() completes. Fixes the below: drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c:697 amdgpu_amdkfd_submit_ib() warn: passing freed memory 'f' (line 696) Fixes: 9ae55f030dc5 ("drm/amdgpu: Follow up change to previous drm scheduler change.") Cc: Felix Kuehling <Felix.Kuehling@amd.com> Cc: Dan Carpenter <dan.carpenter@linaro.org> Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 8b9e5259adc385b61a6590a13b82ae0ac2bd3482) Signed-off-by: Sasha Levin <sashal@kernel.org>
3 daysdrm/xe: Implement recent spec updates to Wa_16025250150Matt Roper2-1/+3
[ Upstream commit 56781a4597706cd25185b1dedc38841ec6c31496 ] The hardware teams noticed that the originally documented workaround steps for Wa_16025250150 may not be sufficient to fully avoid a hardware issue. The workaround documentation has been augmented to suggest programming one additional register; make the corresponding change in the driver. Fixes: 7654d51f1fd8 ("drm/xe/xe2hpg: Add Wa_16025250150") Reviewed-by: Matt Atwood <matthew.s.atwood@intel.com> Link: https://patch.msgid.link/20260319-wa_16025250150_part2-v1-1-46b1de1a31b2@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com> (cherry picked from commit a31566762d4075646a8a2214586158b681e94305) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
3 daysdrm/amd/display: Do not skip unrelated mode changes in DSC validationYussuf Khalil3-1/+9
[ Upstream commit aed3d041ab061ec8a64f50a3edda0f4db7280025 ] Starting with commit 17ce8a6907f7 ("drm/amd/display: Add dsc pre-validation in atomic check"), amdgpu resets the CRTC state mode_changed flag to false when recomputing the DSC configuration results in no timing change for a particular stream. However, this is incorrect in scenarios where a change in MST/DSC configuration happens in the same KMS commit as another (unrelated) mode change. For example, the integrated panel of a laptop may be configured differently (e.g., HDR enabled/disabled) depending on whether external screens are attached. In this case, plugging in external DP-MST screens may result in the mode_changed flag being dropped incorrectly for the integrated panel if its DSC configuration did not change during precomputation in pre_validate_dsc(). At this point, however, dm_update_crtc_state() has already created new streams for CRTCs with DSC-independent mode changes. In turn, amdgpu_dm_commit_streams() will never release the old stream, resulting in a memory leak. amdgpu_dm_atomic_commit_tail() will never acquire a reference to the new stream either, which manifests as a use-after-free when the stream gets disabled later on: BUG: KASAN: use-after-free in dc_stream_release+0x25/0x90 [amdgpu] Write of size 4 at addr ffff88813d836524 by task kworker/9:9/29977 Workqueue: events drm_mode_rmfb_work_fn Call Trace: <TASK> dump_stack_lvl+0x6e/0xa0 print_address_description.constprop.0+0x88/0x320 ? dc_stream_release+0x25/0x90 [amdgpu] print_report+0xfc/0x1ff ? srso_alias_return_thunk+0x5/0xfbef5 ? __virt_addr_valid+0x225/0x4e0 ? dc_stream_release+0x25/0x90 [amdgpu] kasan_report+0xe1/0x180 ? dc_stream_release+0x25/0x90 [amdgpu] kasan_check_range+0x125/0x200 dc_stream_release+0x25/0x90 [amdgpu] dc_state_destruct+0x14d/0x5c0 [amdgpu] dc_state_release.part.0+0x4e/0x130 [amdgpu] dm_atomic_destroy_state+0x3f/0x70 [amdgpu] drm_atomic_state_default_clear+0x8ee/0xf30 ? drm_mode_object_put.part.0+0xb1/0x130 __drm_atomic_state_free+0x15c/0x2d0 atomic_remove_fb+0x67e/0x980 Since there is no reliable way of figuring out whether a CRTC has unrelated mode changes pending at the time of DSC validation, remember the value of the mode_changed flag from before the point where a CRTC was marked as potentially affected by a change in DSC configuration. Reset the mode_changed flag to this earlier value instead in pre_validate_dsc(). Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/5004 Fixes: 17ce8a6907f7 ("drm/amd/display: Add dsc pre-validation in atomic check") Signed-off-by: Yussuf Khalil <dev@pp3345.net> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit cc7c7121ae082b7b82891baa7280f1ff2608f22b) Signed-off-by: Sasha Levin <sashal@kernel.org>
3 daysdrm/i915/gmbus: fix spurious timeout on 512-byte burst readsSamasth Norway Ananda1-1/+3
[ Upstream commit 08441f10f4dc09fdeb64529953ac308abc79dd38 ] When reading exactly 512 bytes with burst read enabled, the extra_byte_added path breaks out of the inner do-while without decrementing len. The outer while(len) then re-enters and gmbus_wait() times out since all data has been delivered. Decrement len before the break so the outer loop terminates correctly. Fixes: d5dc0f43f268 ("drm/i915/gmbus: Enable burst read") Signed-off-by: Samasth Norway Ananda <samasth.norway.ananda@oracle.com> Reviewed-by: Jani Nikula <jani.nikula@intel.com> Link: https://patch.msgid.link/20260316231920.135438-2-samasth.norway.ananda@oracle.com Signed-off-by: Jani Nikula <jani.nikula@intel.com> (cherry picked from commit 4ab0f09ee73fc853d00466682635f67c531f909c) Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
3 daysdrm/mediatek: dsi: Store driver data before invoking mipi_dsi_host_registerLuca Leonardo Scorcia1-4/+5
[ Upstream commit 4cfdfeb6ac06079f92fccd977fa742d6c5b8dd3a ] The call to mipi_dsi_host_register triggers a callback to mtk_dsi_bind, which uses dev_get_drvdata to retrieve the mtk_dsi struct, so this structure needs to be stored inside the driver data before invoking it. As drvdata is currently uninitialized it leads to a crash when registering the DSI DRM encoder right after acquiring the mode_config.idr_mutex, blocking all subsequent DRM operations. Fixes the following crash during mediatek-drm probe (tested on Xiaomi Smart Clock x04g): Unable to handle kernel NULL pointer dereference at virtual address 0000000000000040 [...] Modules linked in: mediatek_drm(+) drm_display_helper cec drm_client_lib drm_dma_helper drm_kms_helper panel_simple [...] Call trace: drm_mode_object_add+0x58/0x98 (P) __drm_encoder_init+0x48/0x140 drm_encoder_init+0x6c/0xa0 drm_simple_encoder_init+0x20/0x34 [drm_kms_helper] mtk_dsi_bind+0x34/0x13c [mediatek_drm] component_bind_all+0x120/0x280 mtk_drm_bind+0x284/0x67c [mediatek_drm] try_to_bring_up_aggregate_device+0x23c/0x320 __component_add+0xa4/0x198 component_add+0x14/0x20 mtk_dsi_host_attach+0x78/0x100 [mediatek_drm] mipi_dsi_attach+0x2c/0x50 panel_simple_dsi_probe+0x4c/0x9c [panel_simple] mipi_dsi_drv_probe+0x1c/0x28 really_probe+0xc0/0x3dc __driver_probe_device+0x80/0x160 driver_probe_device+0x40/0x120 __device_attach_driver+0xbc/0x17c bus_for_each_drv+0x88/0xf0 __device_attach+0x9c/0x1cc device_initial_probe+0x54/0x60 bus_probe_device+0x34/0xa0 device_add+0x5b0/0x800 mipi_dsi_device_register_full+0xdc/0x16c mipi_dsi_host_register+0xc4/0x17c mtk_dsi_probe+0x10c/0x260 [mediatek_drm] platform_probe+0x5c/0xa4 really_probe+0xc0/0x3dc __driver_probe_device+0x80/0x160 driver_probe_device+0x40/0x120 __driver_attach+0xc8/0x1f8 bus_for_each_dev+0x7c/0xe0 driver_attach+0x24/0x30 bus_add_driver+0x11c/0x240 driver_register+0x68/0x130 __platform_register_drivers+0x64/0x160 mtk_drm_init+0x24/0x1000 [mediatek_drm] do_one_initcall+0x60/0x1d0 do_init_module+0x54/0x240 load_module+0x1838/0x1dc0 init_module_from_file+0xd8/0xf0 __arm64_sys_finit_module+0x1b4/0x428 invoke_syscall.constprop.0+0x48/0xc8 do_el0_svc+0x3c/0xb8 el0_svc+0x34/0xe8 el0t_64_sync_handler+0xa0/0xe4 el0t_64_sync+0x198/0x19c Code: 52800022 941004ab 2a0003f3 37f80040 (29005a80) Fixes: e4732b590a77 ("drm/mediatek: dsi: Register DSI host after acquiring clocks and PHY") Signed-off-by: Luca Leonardo Scorcia <l.scorcia@gmail.com> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Reviewed-by: CK Hu <ck.hu@mediatek.com> Link: https://patchwork.kernel.org/project/dri-devel/patch/20260225094047.76780-1-l.scorcia@gmail.com/ Signed-off-by: Chun-Kuang Hu <chunkuang.hu@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
3 daysdrm/amdgpu: fix gpu idle power consumption issue for gfx v12Yang Wang1-1/+4
[ Upstream commit a6571045cf06c4aa749b4801382ae96650e2f0e1 ] Older versions of the MES firmware may cause abnormal GPU power consumption. When performing inference tasks on the GPU (e.g., with Ollama using ROCm), the GPU may show abnormal power consumption in idle state and incorrect GPU load information. This issue has been fixed in firmware version 0x8b and newer. Closes: https://github.com/ROCm/ROCm/issues/5706 Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 4e22a5fe6ea6e0b057e7f246df4ac3ff8bfbc46a) Signed-off-by: Sasha Levin <sashal@kernel.org>
3 daysdrm/ttm/tests: Fix build failure on PREEMPT_RTMaarten Lankhorst1-2/+2
[ Upstream commit a58d487fb1a52579d3c37544ea371da78ed70c45 ] Fix a compile error in the kunit tests when CONFIG_PREEMPT_RT is enabled, and the normal mutex is converted into a rtmutex. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202602261547.3bM6yVAS-lkp@intel.com/ Reviewed-by: Jouni Högander <jouni.hogander@intel.com> Link: https://patch.msgid.link/20260304085616.1216961-1-dev@lankhorst.se Signed-off-by: Maarten Lankhorst <dev@lankhorst.se> Signed-off-by: Sasha Levin <sashal@kernel.org>
11 daysdrm/i915/gt: Check set_default_submission() before deferencingRahul Bukte1-1/+2
[ Upstream commit 0162ab3220bac870e43e229e6e3024d1a21c3f26 ] When the i915 driver firmware binaries are not present, the set_default_submission pointer is not set. This pointer is dereferenced during suspend anyways. Add a check to make sure it is set before dereferencing. [ 23.289926] PM: suspend entry (deep) [ 23.293558] Filesystems sync: 0.000 seconds [ 23.298010] Freezing user space processes [ 23.302771] Freezing user space processes completed (elapsed 0.000 seconds) [ 23.309766] OOM killer disabled. [ 23.313027] Freezing remaining freezable tasks [ 23.318540] Freezing remaining freezable tasks completed (elapsed 0.001 seconds) [ 23.342038] serial 00:05: disabled [ 23.345719] serial 00:02: disabled [ 23.349342] serial 00:01: disabled [ 23.353782] sd 0:0:0:0: [sda] Synchronizing SCSI cache [ 23.358993] sd 1:0:0:0: [sdb] Synchronizing SCSI cache [ 23.361635] ata1.00: Entering standby power mode [ 23.368863] ata2.00: Entering standby power mode [ 23.445187] BUG: kernel NULL pointer dereference, address: 0000000000000000 [ 23.452194] #PF: supervisor instruction fetch in kernel mode [ 23.457896] #PF: error_code(0x0010) - not-present page [ 23.463065] PGD 0 P4D 0 [ 23.465640] Oops: Oops: 0010 [#1] SMP NOPTI [ 23.469869] CPU: 8 UID: 0 PID: 211 Comm: kworker/u48:18 Tainted: G S W 6.19.0-rc4-00020-gf0b9d8eb98df #10 PREEMPT(voluntary) [ 23.482512] Tainted: [S]=CPU_OUT_OF_SPEC, [W]=WARN [ 23.496511] Workqueue: async async_run_entry_fn [ 23.501087] RIP: 0010:0x0 [ 23.503755] Code: Unable to access opcode bytes at 0xffffffffffffffd6. [ 23.510324] RSP: 0018:ffffb4a60065fca8 EFLAGS: 00010246 [ 23.515592] RAX: 0000000000000000 RBX: ffff9f428290e000 RCX: 000000000000000f [ 23.522765] RDX: 0000000000000000 RSI: 0000000000000282 RDI: ffff9f428290e000 [ 23.529937] RBP: ffff9f4282907070 R08: ffff9f4281130428 R09: 00000000ffffffff [ 23.537111] R10: 0000000000000000 R11: 0000000000000001 R12: ffff9f42829070f8 [ 23.544284] R13: ffff9f4282906028 R14: ffff9f4282900000 R15: ffff9f4282906b68 [ 23.551457] FS: 0000000000000000(0000) GS:ffff9f466b2cf000(0000) knlGS:0000000000000000 [ 23.559588] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 23.565365] CR2: ffffffffffffffd6 CR3: 000000031c230001 CR4: 0000000000f70ef0 [ 23.572539] PKRU: 55555554 [ 23.575281] Call Trace: [ 23.577770] <TASK> [ 23.579905] intel_engines_reset_default_submission+0x42/0x60 [ 23.585695] __intel_gt_unset_wedged+0x191/0x200 [ 23.590360] intel_gt_unset_wedged+0x20/0x40 [ 23.594675] gt_sanitize+0x15e/0x170 [ 23.598290] i915_gem_suspend_late+0x6b/0x180 [ 23.602692] i915_drm_suspend_late+0x35/0xf0 [ 23.607008] ? __pfx_pci_pm_suspend_late+0x10/0x10 [ 23.611843] dpm_run_callback+0x78/0x1c0 [ 23.615817] device_suspend_late+0xde/0x2e0 [ 23.620037] async_suspend_late+0x18/0x30 [ 23.624082] async_run_entry_fn+0x25/0xa0 [ 23.628129] process_one_work+0x15b/0x380 [ 23.632182] worker_thread+0x2a5/0x3c0 [ 23.635973] ? __pfx_worker_thread+0x10/0x10 [ 23.640279] kthread+0xf6/0x1f0 [ 23.643464] ? __pfx_kthread+0x10/0x10 [ 23.647263] ? __pfx_kthread+0x10/0x10 [ 23.651045] ret_from_fork+0x131/0x190 [ 23.654837] ? __pfx_kthread+0x10/0x10 [ 23.658634] ret_from_fork_asm+0x1a/0x30 [ 23.662597] </TASK> [ 23.664826] Modules linked in: [ 23.667914] CR2: 0000000000000000 [ 23.671271] ------------[ cut here ]------------ Signed-off-by: Rahul Bukte <rahul.bukte@sony.com> Reviewed-by: Suraj Kandpal <suraj.kandpal@intel.com> Signed-off-by: Suraj Kandpal <suraj.kandpal@intel.com> Link: https://patch.msgid.link/20260203044839.1555147-1-suraj.kandpal@intel.com (cherry picked from commit daa199abc3d3d1740c9e3a2c3e9216ae5b447cad) Fixes: ff44ad51ebf8 ("drm/i915: Move engine->submit_request selection to a vfunc") Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
11 daysdrm/bridge: dw-hdmi-qp: fix multi-channel audio outputJonas Karlman1-1/+1
[ Upstream commit cffcb42c57686e9a801dfcf37a3d0c62e51c1c3e ] Channel Allocation (PB4) and Level Shift Information (PB5) are configured with values from PB1 and PB2 due to the wrong offset being used. This results in missing audio channels or incorrect speaker placement when playing multi-channel audio. Use the correct offset to fix multi-channel audio output. Fixes: fd0141d1a8a2 ("drm/bridge: synopsys: Add audio support for dw-hdmi-qp") Reported-by: Christian Hewitt <christianshewitt@gmail.com> Signed-off-by: Jonas Karlman <jonas@kwiboo.se> Signed-off-by: Christian Hewitt <christianshewitt@gmail.com> Reviewed-by: Cristian Ciocaltea <cristian.ciocaltea@collabora.com> Link: https://patch.msgid.link/20260228112822.4056354-1-christianshewitt@gmail.com Signed-off-by: Luca Ceresoli <luca.ceresoli@bootlin.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
11 daysdrm/amd: fix dcn 2.01 checkAndy Nguyen1-4/+4
[ Upstream commit 39f44f54afa58661ecae9c27e15f5dbce2372892 ] The ASICREV_IS_BEIGE_GOBY_P check always took precedence, because it includes all chip revisions upto NV_UNKNOWN. Fixes: 54b822b3eac3 ("drm/amd/display: Use dce_version instead of chip_id") Signed-off-by: Andy Nguyen <theofficialflow1996@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 9c7be0efa6f0daa949a5f3e3fdf9ea090b0713cb) Signed-off-by: Sasha Levin <sashal@kernel.org>
11 daysdrm/amd/display: Fix DisplayID not-found handling in parse_edid_displayid_vrr()Srinivasan Shanmugam1-2/+2
[ Upstream commit 2323b019651ad81c20a0f7f817c63392b3110652 ] parse_edid_displayid_vrr() searches the EDID extension blocks for a DisplayID extension before parsing the dynamic video timing range. The code previously checked whether edid_ext was NULL after the search loop. However, edid_ext is assigned during each iteration of the loop, so it will never be NULL once the loop has executed. If no DisplayID extension is found, edid_ext ends up pointing to the last extension block, and the NULL check does not correctly detect the failure case. Instead, check whether the loop completed without finding a matching DisplayID block by testing "i == edid->extensions". This ensures the function exits early when no DisplayID extension is present and avoids parsing an unrelated EDID extension block. Also simplify the EDID validation check using "!edid || !edid->extensions". Fixes the below: drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:13079 parse_edid_displayid_vrr() warn: variable dereferenced before check 'edid_ext' (see line 13075) Fixes: a638b837d0e6 ("drm/amd/display: Fix refresh rate range for some panel") Cc: Roman Li <roman.li@amd.com> Cc: Alex Hung <alex.hung@amd.com> Cc: Jerry Zuo <jerry.zuo@amd.com> Cc: Sun peng Li <sunpeng.li@amd.com> Cc: Tom Chung <chiahsuan.chung@amd.com> Cc: Dan Carpenter <dan.carpenter@linaro.org> Cc: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Tom Chung <chiahsuan.chung@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 91c7e6342e98c846b259c57273436fdea4c043f2) Signed-off-by: Sasha Levin <sashal@kernel.org>
11 daysdrm/vmwgfx: Don't overwrite KMS surface dirty trackerIan Forbes1-1/+2
[ Upstream commit c6cb77c474a32265e21c4871c7992468bf5e7638 ] We were overwriting the surface's dirty tracker here causing a memory leak. Reported-by: Mika Penttilä <mpenttil@redhat.com> Closes: https://lore.kernel.org/dri-devel/8c53f3c6-c6de-46fe-a8ca-d98dd52b3abe@redhat.com/ Fixes: 965544150d1c ("drm/vmwgfx: Refactor cursor handling") Signed-off-by: Ian Forbes <ian.forbes@broadcom.com> Reviewed-by: Maaz Mombasawala <maaz.mombasawala@broadcom.com> Signed-off-by: Zack Rusin <zack.rusin@broadcom.com> Link: https://patch.msgid.link/20260302200330.66763-1-ian.forbes@broadcom.com Signed-off-by: Sasha Levin <sashal@kernel.org>
11 daysdrm/i915/psr: Compute PSR entry_setup_frames into intel_crtc_stateJouni Högander2-2/+4
commit 7caac659a837af9fd4cad85be851982b88859484 upstream. PSR entry_setup_frames is currently computed directly into struct intel_dp:intel_psr:entry_setup_frames. This causes a problem if mode change gets rejected after PSR compute config: Psr_entry_setup_frames computed for this rejected state is in intel_dp:intel_psr:entry_setup_frame. Fix this by computing it into intel_crtc_state and copy the value into intel_dp:intel_psr:entry_setup_frames on PSR enable. Fixes: 2b981d57e480 ("drm/i915/display: Support PSR entry VSC packet to be transmitted one frame earlier") Cc: Mika Kahola <mika.kahola@intel.com> Cc: <stable@vger.kernel.org> # v6.8+ Signed-off-by: Jouni Högander <jouni.hogander@intel.com> Reviewed-by: Suraj Kandpal <suraj.kandpal@intel.com> Link: https://patch.msgid.link/20260312083710.1593781-3-jouni.hogander@intel.com (cherry picked from commit 8c229b4aa00262c13787982e998c61c0783285e0) Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> [ adapted context lines to account for missing `no_psr_reason` field and `alpm_state` struct. ] Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 daysdrm/xe: Open-code GGTT MMIO access protectionMatthew Brost2-7/+8
commit 01f2557aa684e514005541e71a3d01f4cd45c170 upstream. GGTT MMIO access is currently protected by hotplug (drm_dev_enter), which works correctly when the driver loads successfully and is later unbound or unloaded. However, if driver load fails, this protection is insufficient because drm_dev_unplug() is never called. Additionally, devm release functions cannot guarantee that all BOs with GGTT mappings are destroyed before the GGTT MMIO region is removed, as some BOs may be freed asynchronously by worker threads. To address this, introduce an open-coded flag, protected by the GGTT lock, that guards GGTT MMIO access. The flag is cleared during the dev_fini_ggtt devm release function to ensure MMIO access is disabled once teardown begins. Cc: stable@vger.kernel.org Fixes: 919bb54e989c ("drm/xe: Fix missing runtime outer protection for ggtt_remove_node") Reviewed-by: Zhanjun Dong <zhanjun.dong@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20260310225039.1320161-8-zhanjun.dong@intel.com (cherry picked from commit 4f3a998a173b4325c2efd90bdadc6ccd3ad9a431) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 daysdrm/xe/oa: Allow reading after disabling OA streamAshutosh Dixit1-2/+5
commit 9be6fd9fbd2032b683e51374497768af9aaa228a upstream. Some OA data might be present in the OA buffer when OA stream is disabled. Allow UMD's to retrieve this data, so that all data till the point when OA stream is disabled can be retrieved. v2: Update tail pointer after disable (Umesh) Fixes: efb315d0a013 ("drm/xe/oa/uapi: Read file_operation") Cc: stable@vger.kernel.org Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Reviewed-by: Umesh Nerlige Ramappa<umesh.nerlige.ramappa@intel.com> Link: https://patch.msgid.link/20260313053630.3176100-1-ashutosh.dixit@intel.com (cherry picked from commit 4ff57c5e8dbba23b5457be12f9709d5c016da16e) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 daysdrm/xe/guc: Ensure CT state transitions via STOP before DISABLEDZhanjun Dong1-0/+1
commit 7838dd8367419e9fc43b79c038321cb3c04de2a2 upstream. The GuC CT state transition requires moving to the STOP state before entering the DISABLED state. Update the driver teardown sequence to make the proper state machine transitions. Fixes: ee4b32220a6b ("drm/xe/guc: Add devm release action to safely tear down CT") Cc: stable@vger.kernel.org Signed-off-by: Zhanjun Dong <zhanjun.dong@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20260310225039.1320161-6-zhanjun.dong@intel.com (cherry picked from commit dace8cb0032f57ea67c87b3b92ad73c89dd2db44) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 daysdrm/i915/dmc: Fix an unlikely NULL pointer deference at probeImre Deak2-3/+2
commit ac57eb3b7d2ad649025b5a0fa207315f755ac4f6 upstream. intel_dmc_update_dc6_allowed_count() oopses when DMC hasn't been initialized, and dmc is thus NULL. That would be the case when the call path is intel_power_domains_init_hw() -> {skl,bxt,icl}_display_core_init() -> gen9_set_dc_state() -> intel_dmc_update_dc6_allowed_count(), as intel_power_domains_init_hw() is called *before* intel_dmc_init(). However, gen9_set_dc_state() calls intel_dmc_update_dc6_allowed_count() conditionally, depending on the current and target DC states. At probe, the target is disabled, but if DC6 is enabled, the function is called, and an oops follows. Apparently it's quite unlikely that DC6 is enabled at probe, as we haven't seen this failure mode before. It is also strange to have DC6 enabled at boot, since that would require the DMC firmware (loaded by BIOS); the BIOS loading the DMC firmware and the driver stopping / reprogramming the firmware is a poorly specified sequence and as such unlikely an intentional BIOS behaviour. It's more likely that BIOS is leaving an unintentionally enabled DC6 HW state behind (without actually loading the required DMC firmware for this). The tracking of the DC6 allowed counter only works if starting / stopping the counter depends on the _SW_ DC6 state vs. the current _HW_ DC6 state (since stopping the counter requires the DC5 counter captured when the counter was started). Thus, using the HW DC6 state is incorrect and it also leads to the above oops. Fix both issues by using the SW DC6 state for the tracking. This is v2 of the fix originally sent by Jani, updated based on the first Link: discussion below. Link: https://lore.kernel.org/all/3626411dc9e556452c432d0919821b76d9991217@intel.com Link: https://lore.kernel.org/all/20260228130946.50919-2-ltao@redhat.com Fixes: 88c1f9a4d36d ("drm/i915/dmc: Create debugfs entry for dc6 counter") Cc: Mohammed Thasleem <mohammed.thasleem@intel.com> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Tao Liu <ltao@redhat.com> Cc: <stable@vger.kernel.org> # v6.16+ Tested-by: Tao Liu <ltao@redhat.com> Reviewed-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Imre Deak <imre.deak@intel.com> Link: https://patch.msgid.link/20260309164803.1918158-1-imre.deak@intel.com (cherry picked from commit 2344b93af8eb5da5d496b4e0529d35f0f559eaf0) Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 daysdrm/amdgpu: Limit BO list entry count to prevent resource exhaustionJesse.Zhang1-0/+4
commit 6270b1a5dab94665d7adce3dc78bc9066ed28bdd upstream. Userspace can pass an arbitrary number of BO list entries via the bo_number field. Although the previous multiplication overflow check prevents out-of-bounds allocation, a large number of entries could still cause excessive memory allocation (up to potentially gigabytes) and unnecessarily long list processing times. Introduce a hard limit of 128k entries per BO list, which is more than sufficient for any realistic use case (e.g., a single list containing all buffers in a large scene). This prevents memory exhaustion attacks and ensures predictable performance. Return -EINVAL if the requested entry count exceeds the limit Reviewed-by: Christian König <christian.koenig@amd.com> Suggested-by: Christian König <christian.koenig@amd.com> Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 688b87d39e0aa8135105b40dc167d74b5ada5332) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 daysdrm/amdgpu: apply state adjust rules to some additional HAINAN vairantsAlex Deucher1-1/+3
commit 9787f7da186ee8143b7b6d914cfa0b6e7fee2648 upstream. They need a similar workaround. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/1839 Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 0de31d92a173d3d94f28051b0b80a6c98913aed4) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 daysdrm/radeon: apply state adjust rules to some additional HAINAN vairantsAlex Deucher1-1/+3
commit 86650ee2241ff84207eaa298ab318533f3c21a38 upstream. They need a similar workaround. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/1839 Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 87327658c848f56eac166cb382b57b83bf06c5ac) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 daysdrm/imagination: Synchronize interrupts before suspending the GPUAlessio Belle2-20/+8
commit 2d7f05cddf4c268cc36256a2476946041dbdd36d upstream. The runtime PM suspend callback doesn't know whether the IRQ handler is in progress on a different CPU core and doesn't wait for it to finish. Depending on timing, the IRQ handler could be running while the GPU is suspended, leading to kernel crashes when trying to access GPU registers. See example signature below. In a power off sequence initiated by the runtime PM suspend callback, wait for any IRQ handlers in progress on other CPU cores to finish, by calling synchronize_irq(). At the same time, remove the runtime PM resume/put calls in the threaded IRQ handler. On top of not being the right approach to begin with, and being at the wrong place as they should have wrapped all GPU register accesses, the driver would hit a deadlock between synchronize_irq() being called from a runtime PM suspend callback, holding the device power lock, and the resume callback requiring the same. Example crash signature on a TI AM68 SK platform: [ 337.241218] SError Interrupt on CPU0, code 0x00000000bf000000 -- SError [ 337.241239] CPU: 0 UID: 0 PID: 112 Comm: irq/234-gpu Tainted: G M 6.17.7-B2C-00005-g9c7bbe4ea16c #2 PREEMPT [ 337.241246] Tainted: [M]=MACHINE_CHECK [ 337.241249] Hardware name: Texas Instruments AM68 SK (DT) [ 337.241252] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 337.241256] pc : pvr_riscv_irq_pending+0xc/0x24 [ 337.241277] lr : pvr_device_irq_thread_handler+0x64/0x310 [ 337.241282] sp : ffff800085b0bd30 [ 337.241284] x29: ffff800085b0bd50 x28: ffff0008070d9eab x27: ffff800083a5ce10 [ 337.241291] x26: ffff000806e48f80 x25: ffff0008070d9eac x24: 0000000000000000 [ 337.241296] x23: ffff0008068e9bf0 x22: ffff0008068e9bd0 x21: ffff800085b0bd30 [ 337.241301] x20: ffff0008070d9e00 x19: ffff0008068e9000 x18: 0000000000000001 [ 337.241305] x17: 637365645f656c70 x16: 0000000000000000 x15: ffff000b7df9ff40 [ 337.241310] x14: 0000a585fe3c0d0e x13: 000000999704f060 x12: 000000000002771a [ 337.241314] x11: 00000000000000c0 x10: 0000000000000af0 x9 : ffff800085b0bd00 [ 337.241318] x8 : ffff0008071175d0 x7 : 000000000000b955 x6 : 0000000000000003 [ 337.241323] x5 : 0000000000000000 x4 : 0000000000000002 x3 : 0000000000000000 [ 337.241327] x2 : ffff800080e39d20 x1 : ffff800080e3fc48 x0 : 0000000000000000 [ 337.241333] Kernel panic - not syncing: Asynchronous SError Interrupt [ 337.241337] CPU: 0 UID: 0 PID: 112 Comm: irq/234-gpu Tainted: G M 6.17.7-B2C-00005-g9c7bbe4ea16c #2 PREEMPT [ 337.241342] Tainted: [M]=MACHINE_CHECK [ 337.241343] Hardware name: Texas Instruments AM68 SK (DT) [ 337.241345] Call trace: [ 337.241348] show_stack+0x18/0x24 (C) [ 337.241357] dump_stack_lvl+0x60/0x80 [ 337.241364] dump_stack+0x18/0x24 [ 337.241368] vpanic+0x124/0x2ec [ 337.241373] abort+0x0/0x4 [ 337.241377] add_taint+0x0/0xbc [ 337.241384] arm64_serror_panic+0x70/0x80 [ 337.241389] do_serror+0x3c/0x74 [ 337.241392] el1h_64_error_handler+0x30/0x48 [ 337.241400] el1h_64_error+0x6c/0x70 [ 337.241404] pvr_riscv_irq_pending+0xc/0x24 (P) [ 337.241410] irq_thread_fn+0x2c/0xb0 [ 337.241416] irq_thread+0x170/0x334 [ 337.241421] kthread+0x12c/0x210 [ 337.241428] ret_from_fork+0x10/0x20 [ 337.241434] SMP: stopping secondary CPUs [ 337.241451] Kernel Offset: disabled [ 337.241453] CPU features: 0x040000,02002800,20002001,0400421b [ 337.241456] Memory Limit: none [ 337.457921] ---[ end Kernel panic - not syncing: Asynchronous SError Interrupt ]--- Fixes: cc1aeedb98ad ("drm/imagination: Implement firmware infrastructure and META FW support") Fixes: 96822d38ff57 ("drm/imagination: Handle Rogue safety event IRQs") Cc: stable@vger.kernel.org # see patch description, needs adjustments for < 6.16 Signed-off-by: Alessio Belle <alessio.belle@imgtec.com> Reviewed-by: Matt Coster <matt.coster@imgtec.com> Link: https://patch.msgid.link/20260310-drain-irqs-before-suspend-v1-1-bf4f9ed68e75@imgtec.com Signed-off-by: Matt Coster <matt.coster@imgtec.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 daysdrm/imagination: Fix deadlock in soft reset sequenceAlessio Belle1-1/+10
commit a55c2a5c8d680156495b7b1e2a9f5a3e313ba524 upstream. The soft reset sequence is currently executed from the threaded IRQ handler, hence it cannot call disable_irq() which internally waits for IRQ handlers, i.e. itself, to complete. Use disable_irq_nosync() during a soft reset instead. Fixes: cc1aeedb98ad ("drm/imagination: Implement firmware infrastructure and META FW support") Cc: stable@vger.kernel.org Signed-off-by: Alessio Belle <alessio.belle@imgtec.com> Reviewed-by: Matt Coster <matt.coster@imgtec.com> Link: https://patch.msgid.link/20260309-fix-soft-reset-v1-1-121113be554f@imgtec.com Signed-off-by: Matt Coster <matt.coster@imgtec.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 daysdrm/amdgpu/mmhub4.1.0: add bounds checking for cidAlex Deucher1-1/+2
commit 3cdd405831d8cc50a5eae086403402697bb98a4a upstream. The value should never exceed the array size as those are the only values the hardware is expected to return, but add checks anyway. Reviewed-by: Benjamin Cheng <benjamin.cheng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 04f063d85090f5dd0c671010ce88ee49d9dcc8ed) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 daysdrm/amdgpu/mmhub3.0: add bounds checking for cidAlex Deucher1-1/+2
commit cdb82ecbeccb55fae75a3c956b