summaryrefslogtreecommitdiff
path: root/drivers/gpu
AgeCommit message (Collapse)AuthorFilesLines
12 daysdrm/amdgpu: fix zero-size GDS range init on RDNA4Arjan van de Ven1-0/+3
commit 095a8b0ad3c3b5cdc3850d961adb8a8f735220bb upstream. RDNA4 (GFX 12) hardware removes the GDS, GWS, and OA on-chip memory resources. The gfx_v12_0 initialisation code correctly leaves adev->gds.gds_size, adev->gds.gws_size, and adev->gds.oa_size at zero to reflect this. amdgpu_ttm_init() unconditionally calls amdgpu_ttm_init_on_chip() for each of these resources regardless of size. When the size is zero, amdgpu_ttm_init_on_chip() forwards the call to ttm_range_man_init(), which calls drm_mm_init(mm, 0, 0). drm_mm_init() immediately fires DRM_MM_BUG_ON(start + size <= start) -- trivially true when size is zero -- crashing the kernel during modprobe of amdgpu on an RX 9070 XT. Guard against this by returning 0 early from amdgpu_ttm_init_on_chip() when size_in_page is zero. This skips TTM resource manager registration for hardware resources that are absent, without affecting any other GPU type. DRM_MM_BUG_ON() only asserts if CONFIG_DRM_DEBUG_MM is enabled in the kernel config. This is apparently rarely enabled as these chips have been in the market for over a year and this issue was only reported now. Link: https://lore.kernel.org/all/bug-221376-2300@https.bugzilla.kernel.org%2F/ Link: https://bugzilla.kernel.org/show_bug.cgi?id=221376 Oops-Analysis: http://oops.fenrus.org/reports/bugzilla.korg/221376/report.html Assisted-by: GitHub Copilot:Claude Sonnet 4.6 linux-kernel-oops-x86. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: "Christian König" <christian.koenig@amd.com> Cc: amd-gfx@lists.freedesktop.org Cc: dri-devel@lists.freedesktop.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 5719ce5865279cad4fd5f01011fe037168503f2d) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 daysamdgpu/jpeg: fix deepsleep register for jpeg 5_0_0 and 5_0_2David (Ming Qiang) Wu1-6/+46
commit e90dc3b2d73986610476b02c29d0074aa4d92fb0 upstream. PCTL0__MMHUB_DEEPSLEEP_IB is 0x69004 on MMHUB 4,1,0 and and 0x60804 on MMHUB 4,2,0. 0x62a04 is on MMHUB 1,8,0/1. The DS bits are adjusted to cover more JPEG engines and MMHUB version. Signed-off-by: David (Ming Qiang) Wu <David.Wu3@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 daysdrm/amd: Fix set but not used warningsTiezhu Yang3-11/+8
commit 46791d147d3ab3262298478106ef2a52fc7192e2 upstream. There are many set but not used warnings under drivers/gpu/drm/amd when compiling with the latest upstream mainline GCC: drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c:305:18: warning: variable ‘p’ set but not used [-Wunused-but-set-variable=] drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h:103:26: warning: variable ‘internal_reg_offset’ set but not used [-Wunused-but-set-variable=] ... drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h:164:26: warning: variable ‘internal_reg_offset’ set but not used [-Wunused-but-set-variable=] ... drivers/gpu/drm/amd/amdgpu/../display/dc/dc_dmub_srv.c:445:13: warning: variable ‘pipe_idx’ set but not used [-Wunused-but-set-variable=] drivers/gpu/drm/amd/amdgpu/../display/dc/dc_dmub_srv.c:875:21: warning: variable ‘pipe_idx’ set but not used [-Wunused-but-set-variable=] Remove the variables actually not used or add __maybe_unused attribute for the variables actually used to fix them, compile tested only. Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 daysdrm/arcpgu: fix device node leakLuca Ceresoli1-1/+2
commit ad3ac32a3893a2bbcad545efc005a8e4e7ecf10c upstream. This function gets a device_node reference via of_graph_get_remote_port_parent() and stores it in encoder_node, but never puts that reference. Add it. There used to be a of_node_put(encoder_node) but it has been removed by mistake during a rework in commit 3ea66a794fdc ("drm/arc: Inline arcpgu_drm_hdmi_init"). Fixes: 3ea66a794fdc ("drm/arc: Inline arcpgu_drm_hdmi_init") Cc: stable@vger.kernel.org Reviewed-by: Louis Chauvet <louis.chauvet@bootlin.com> Link: https://patch.msgid.link/20260402-drm-arcgpu-fix-device-node-leak-v2-1-d773cf754ae5@bootlin.com Signed-off-by: Luca Ceresoli <luca.ceresoli@bootlin.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 daysdrm/nouveau: fix nvkm_device leak on aperture removal failureDavid Carlier1-1/+1
commit 6597ff1d8de3f583be169587efeafd8af134e138 upstream. When aperture_remove_conflicting_pci_devices() fails during probe, the error path returns directly without unwinding the nvkm_device that was just allocated by nvkm_device_pci_new(). This leaks both the device wrapper and the pci_enable_device() reference taken inside it. Jump to the existing fail_nvkm label so nvkm_device_del() runs and balances both. The leak was introduced when the intermediate nvkm_device_del() between detection and aperture removal was dropped in favor of creating the pci device once. Fixes: c0bfe34330b5 ("drm/nouveau: create pci device once") Cc: stable@vger.kernel.org Signed-off-by: David Carlier <devnexen@gmail.com> Link: https://patch.msgid.link/20260411062938.22925-1-devnexen@gmail.com Signed-off-by: Danilo Krummrich <dakr@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 daysdrm/nouveau: fix u32 overflow in pushbuf reloc bounds checkGreg Kroah-Hartman1-1/+1
commit 2fc87d37be1b730a149b035f9375fdb8cc5333a5 upstream. nouveau_gem_pushbuf_reloc_apply() validates each relocation with if (r->reloc_bo_offset + 4 > nvbo->bo.base.size) but reloc_bo_offset is __u32 (uapi/drm/nouveau_drm.h) and the integer literal 4 promotes to unsigned int, so the addition is performed in 32 bits and wraps before the comparison against the size_t bo size. Cast to u64 so the addition happens in 64-bit arithmetic. Cc: Lyude Paul <lyude@redhat.com> Cc: Danilo Krummrich <dakr@kernel.org> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Maxime Ripard <mripard@kernel.org> Cc: Thomas Zimmermann <tzimmermann@suse.de> Cc: David Airlie <airlied@gmail.com> Cc: Simona Vetter <simona@ffwll.ch> Reported-by: Anthropic Cc: stable <stable@kernel.org> Assisted-by: gkh_clanker_t1000 Fixes: a1606a9596e5 ("drm/nouveau: new gem pushbuf interface, bump to 0.0.16") Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> [ Add Fixes: tag. - Danilo ] Signed-off-by: Danilo Krummrich <dakr@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-04-27drm/amdgpu: replace PASID IDR with XArrayMikhail Gavrilov1-20/+19
commit 3c863ff920b45fa7a9b7d4cb932f466488a87a58 upstream. Replace the PASID IDR + spinlock with XArray as noted in the TODO left by commit ea56aa262570 ("drm/amdgpu: fix the idr allocation flags"). The IDR conversion still has an IRQ safety issue: amdgpu_pasid_free() can be called from hardirq context via the fence signal path, but amdgpu_pasid_idr_lock is taken with plain spin_lock() in process context, creating a potential deadlock: CPU0 ---- spin_lock(&amdgpu_pasid_idr_lock) // process context, IRQs on <Interrupt> spin_lock(&amdgpu_pasid_idr_lock) // deadlock The hardirq call chain is: sdma_v6_0_process_trap_irq -> amdgpu_fence_process -> dma_fence_signal -> drm_sched_job_done -> dma_fence_signal -> amdgpu_pasid_free_cb -> amdgpu_pasid_free Use XArray with XA_FLAGS_LOCK_IRQ (all xa operations use IRQ-safe locking internally) and XA_FLAGS_ALLOC1 (zero is not a valid PASID). Both xa_alloc_cyclic() and xa_erase() then handle locking consistently, fixing the IRQ safety issue and removing the need for an explicit spinlock. v8: squash in irq safe fix Reviewed-by: Christian König <christian.koenig@amd.com> Suggested-by: Lijo Lazar <lijo.lazar@amd.com> Fixes: ea56aa262570 ("drm/amdgpu: fix the idr allocation flags") Fixes: 8f1de51f49be ("drm/amdgpu: prevent immediate PASID reuse case") Signed-off-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: Thomas Sowell <tom@ldtlb.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-04-22x86: rename and clean up __copy_from_user_inatomic_nocache()Linus Torvalds2-2/+2
commit 5de7bcaadf160c1716b20a263cf8f5b06f658959 upstream. Similarly to the previous commit, this renames the somewhat confusingly named function. But in this case, it was at least less confusing: the __copy_from_user_inatomic_nocache is indeed copying from user memory, and it is indeed ok to be used in an atomic context, so it will not warn about it. But the previous commit also removed the NTB mis-use of the __copy_from_user_inatomic_nocache() function, and as a result every call-site is now _actually_ doing a real user copy. That means that we can now do the proper user pointer verification too. End result: add proper address checking, remove the double underscores, and change the "nocache" to "nontemporal" to more accurately describe what this x86-only function actually does. It might be worth noting that only the target is non-temporal: the actual user accesses are normal memory accesses. Also worth noting is that non-x86 targets (and on older 32-bit x86 CPU's before XMM2 in the Pentium III) we end up just falling back on a regular user copy, so nothing can actually depend on the non-temporal semantics, but that has always been true. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-04-22drm/vc4: platform_get_irq_byname() returns an intGreg Kroah-Hartman1-4/+10
commit e597a809a2b97e927060ba182f58eb3e6101bc70 upstream. platform_get_irq_byname() will return a negative value if an error happens, so it should be checked and not just passed directly into devm_request_threaded_irq() hoping all will be ok. Cc: Maxime Ripard <mripard@kernel.org> Cc: Dave Stevenson <dave.stevenson@raspberrypi.com> Cc: Maíra Canal <mcanal@igalia.com> Cc: Raspberry Pi Kernel Maintenance <kernel-list@raspberrypi.com> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Thomas Zimmermann <tzimmermann@suse.de> Cc: David Airlie <airlied@gmail.com> Cc: Simona Vetter <simona@ffwll.ch> Cc: stable <stable@kernel.org> Assisted-by: gkh_clanker_2000 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://patch.msgid.link/2026022339-cornflake-t-shirt-2471@gregkh Signed-off-by: Maíra Canal <mcanal@igalia.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-04-22drm/xe: Fix bug in idledly unit conversionVinay Belgaumkar1-2/+1
[ Upstream commit 7596459f3c93d8d45a1bf12d4d7526b50c15baa2 ] We only need to convert to picosecond units before writing to RING_IDLEDLY. Fixes: 7c53ff050ba8 ("drm/xe: Apply Wa_16023105232") Cc: Tangudu Tilak Tirumalesh <tilak.tirumalesh.tangudu@intel.com> Acked-by: Tangudu Tilak Tirumalesh <tilak.tirumalesh.tangudu@intel.com> Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Link: https://patch.msgid.link/20260401012710.4165547-1-vinay.belgaumkar@intel.com (cherry picked from commit 13743bd628bc9d9a0e2fe53488b2891aedf7cc74) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-04-22drm/vc4: Protect madv read in vc4_gem_object_mmap() with madv_lockMaíra Canal1-0/+3
[ Upstream commit 338c56050d8e892604da97f67bfa8cc4015a955f ] The mmap callback reads bo->madv without holding madv_lock, racing with concurrent DRM_IOCTL_VC4_GEM_MADVISE calls that modify the field under the same lock. Add the missing locking to prevent the data race. Fixes: b9f19259b84d ("drm/vc4: Add the DRM_IOCTL_VC4_GEM_MADVISE ioctl") Reviewed-by: Melissa Wen <mwen@igalia.com> Link: https://patch.msgid.link/20260330-vc4-misc-fixes-v1-4-92defc940a29@igalia.com Signed-off-by: Maíra Canal <mcanal@igalia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-04-22drm/vc4: Fix a memory leak in hang state error pathMaíra Canal1-8/+10
[ Upstream commit 9525d169e5fd481538cf8c663cc5839e54f2e481 ] When vc4_save_hang_state() encounters an early return condition, it returns without freeing the previously allocated `kernel_state`, leaking memory. Add the missing kfree() calls by consolidating the early return paths into a single place. Fixes: 214613656b51 ("drm/vc4: Add an interface for capturing the GPU state after a hang.") Reviewed-by: Melissa Wen <mwen@igalia.com> Link: https://patch.msgid.link/20260330-vc4-misc-fixes-v1-3-92defc940a29@igalia.com Signed-off-by: Maíra Canal <mcanal@igalia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-04-22drm/vc4: Fix memory leak of BO array in hang stateMaíra Canal1-0/+1
[ Upstream commit f4dfd6847b3e5d24e336bca6057485116d17aea4 ] The hang state's BO array is allocated separately with kzalloc() in vc4_save_hang_state() but never freed in vc4_free_hang_state(). Add the missing kfree() for the BO array before freeing the hang state struct. Fixes: 214613656b51 ("drm/vc4: Add an interface for capturing the GPU state after a hang.") Reviewed-by: Melissa Wen <mwen@igalia.com> Link: https://patch.msgid.link/20260330-vc4-misc-fixes-v1-2-92defc940a29@igalia.com Signed-off-by: Maíra Canal <mcanal@igalia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-04-22drm/vc4: Release runtime PM reference after binding V3DMaíra Canal1-0/+1
[ Upstream commit aaefbdde9abdc43699e110679c0e10972a5e1c59 ] The vc4_v3d_bind() function acquires a runtime PM reference via pm_runtime_resume_and_get() to access V3D registers during setup. However, this reference is never released after a successful bind. This prevents the device from ever runtime suspending, since the reference count never reaches zero. Release the runtime PM reference by adding pm_runtime_put_autosuspend() after autosuspend is configured, allowing the device to runtime suspend after the delay. Fixes: 266cff37d7fc ("drm/vc4: v3d: Rework the runtime_pm setup") Reviewed-by: Melissa Wen <mwen@igalia.com> Link: https://patch.msgid.link/20260330-vc4-misc-fixes-v1-1-92defc940a29@igalia.com Signed-off-by: Maíra Canal <mcanal@igalia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-04-22drm/amdkfd: Fix queue preemption/eviction failures by aligning control stack ↵Donet Tom1-3/+4
size to GPU page size [ Upstream commit 78746a474e92fc7aaed12219bec7c78ae1bd6156 ] The control stack size is calculated based on the number of CUs and waves, and is then aligned to PAGE_SIZE. When the resulting control stack size is aligned to 64 KB, GPU hangs and queue preemption failures are observed while running RCCL unit tests on systems with more than two GPUs. amdgpu 0048:0f:00.0: amdgpu: Queue preemption failed for queue with doorbell_id: 80030008 amdgpu 0048:0f:00.0: amdgpu: Failed to evict process queues amdgpu 0048:0f:00.0: amdgpu: GPU reset begin!. Source: 4 amdgpu 0048:0f:00.0: amdgpu: Queue preemption failed for queue with doorbell_id: 80030008 amdgpu 0048:0f:00.0: amdgpu: Failed to evict process queues amdgpu 0048:0f:00.0: amdgpu: Failed to restore process queues This issue is observed on both 4 KB and 64 KB system page-size configurations. This patch fixes the issue by aligning the control stack size to AMDGPU_GPU_PAGE_SIZE instead of PAGE_SIZE, so the control stack size will not be 64 KB on systems with a 64 KB page size and queue preemption works correctly. Additionally, In the current code, wg_data_size is aligned to PAGE_SIZE, which can waste memory if the system page size is large. In this patch, wg_data_size is aligned to AMDGPU_GPU_PAGE_SIZE. The cwsr_size, calculated from wg_data_size and the control stack size, is aligned to PAGE_SIZE. Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Donet Tom <donettom@linux.ibm.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit a3e14436304392fbada359edd0f1d1659850c9b7) Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-04-22drm/amdgpu: Handle GPU page faults correctly on non-4K page systemsDonet Tom1-3/+3
[ Upstream commit 4e9597f22a3cb8600c72fc266eaac57981d834c8 ] During a GPU page fault, the driver restores the SVM range and then maps it into the GPU page tables. The current implementation passes a GPU-page-size (4K-based) PFN to svm_range_restore_pages() to restore the range. SVM ranges are tracked using system-page-size PFNs. On systems where the system page size is larger than 4K, using GPU-page-size PFNs to restore the range causes two problems: Range lookup fails: Because the restore function receives PFNs in GPU (4K) units, the SVM range lookup does not find the existing range. This will result in a duplicate SVM range being created. VMA lookup failure: The restore function also tries to locate the VMA for the faulting address. It converts the GPU-page-size PFN into an address using the system page size, which results in an incorrect address on non-4K page-size systems. As a result, the VMA lookup fails with the message: "address 0xxxx VMA is removed". This patch passes the system-page-size PFN to svm_range_restore_pages() so that the SVM range is restored correctly on non-4K page systems. Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Donet Tom <donettom@linux.ibm.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 074fe395fb13247b057f60004c7ebcca9f38ef46) Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-04-18drm/i915/psr: Do not use pipe_src as borders for SU areaJouni Högander1-11/+19
commit 75519f5df2a9b23f7bf305e12dc9a6e3e65c24b7 upstream. This far using crtc_state->pipe_src as borders for Selective Update area haven't caused visible problems as drm_rect_width(crtc_state->pipe_src) == crtc_state->hw.adjusted_mode.crtc_hdisplay and drm_rect_height(crtc_state->pipe_src) == crtc_state->hw.adjusted_mode.crtc_vdisplay when pipe scaling is not used. On the other hand using pipe scaling is forcing full frame updates and all the Selective Update area calculations are skipped. Now this improper usage of crtc_state->pipe_src is causing following warnings: <4> [7771.978166] xe 0000:00:02.0: [drm] drm_WARN_ON_ONCE(su_lines % vdsc_cfg->slice_height) after WARN_ON_ONCE was added by commit: "drm/i915/dsc: Add helper for writing DSC Selective Update ET parameters" These warnings are seen when DSC and pipe scaling are enabled simultaneously. This is because on full frame update SU area is improperly set as pipe_src which is not aligned with DSC slice height. Fix these by creating local rectangle using crtc_state->hw.adjusted_mode.crtc_hdisplay and crtc_state->hw.adjusted_mode.crtc_vdisplay. Use this local rectangle as borders for SU area. Fixes: d6774b8c3c58 ("drm/i915: Ensure damage clip area is within pipe area") Cc: <stable@vger.kernel.org> # v6.0+ Signed-off-by: Jouni Högander <jouni.hogander@intel.com> Reviewed-by: Mika Kahola <mika.kahola@intel.com> Link: https://patch.msgid.link/20260327114553.195285-1-jouni.hogander@intel.com (cherry picked from commit da0cdc1c329dd2ff09c41fbbe9fbd9c92c5d2c6e) Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-04-18drm/i915/gt: fix refcount underflow in intel_engine_park_heartbeatSebastian Brzezinka1-8/+18
commit 4c71fd099513bfa8acab529b626e1f0097b76061 upstream. A use-after-free / refcount underflow is possible when the heartbeat worker and intel_engine_park_heartbeat() race to release the same engine->heartbeat.systole request. The heartbeat worker reads engine->heartbeat.systole and calls i915_request_put() on it when the request is complete, but clears the pointer in a separate, non-atomic step. Concurrently, a request retirement on another CPU can drop the engine wakeref to zero, triggering __engine_park() -> intel_engine_park_heartbeat(). If the heartbeat timer is pending at that point, cancel_delayed_work() returns true and intel_engine_park_heartbeat() reads the stale non-NULL systole pointer and calls i915_request_put() on it again, causing a refcount underflow: ``` <4> [487.221889] Workqueue: i915-unordered engine_retire [i915] <4> [487.222640] RIP: 0010:refcount_warn_saturate+0x68/0xb0 ... <4> [487.222707] Call Trace: <4> [487.222711] <TASK> <4> [487.222716] intel_engine_park_heartbeat.part.0+0x6f/0x80 [i915] <4> [487.223115] intel_engine_park_heartbeat+0x25/0x40 [i915] <4> [487.223566] __engine_park+0xb9/0x650 [i915] <4> [487.223973] ____intel_wakeref_put_last+0x2e/0xb0 [i915] <4> [487.224408] __intel_wakeref_put_last+0x72/0x90 [i915] <4> [487.224797] intel_context_exit_engine+0x7c/0x80 [i915] <4> [487.225238] intel_context_exit+0xf1/0x1b0 [i915] <4> [487.225695] i915_request_retire.part.0+0x1b9/0x530 [i915] <4> [487.226178] i915_request_retire+0x1c/0x40 [i915] <4> [487.226625] engine_retire+0x122/0x180 [i915] <4> [487.227037] process_one_work+0x239/0x760 <4> [487.227060] worker_thread+0x200/0x3f0 <4> [487.227068] ? __pfx_worker_thread+0x10/0x10 <4> [487.227075] kthread+0x10d/0x150 <4> [487.227083] ? __pfx_kthread+0x10/0x10 <4> [487.227092] ret_from_fork+0x3d4/0x480 <4> [487.227099] ? __pfx_kthread+0x10/0x10 <4> [487.227107] ret_from_fork_asm+0x1a/0x30 <4> [487.227141] </TASK> ``` Fix this by replacing the non-atomic pointer read + separate clear with xchg() in both racing paths. xchg() is a single indivisible hardware instruction that atomically reads the old pointer and writes NULL. This guarantees only one of the two concurrent callers obtains the non-NULL pointer and performs the put, the other gets NULL and skips it. Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/work_items/15880 Fixes: 058179e72e09 ("drm/i915/gt: Replace hangcheck by heartbeats") Cc: <stable@vger.kernel.org> # v5.5+ Signed-off-by: Sebastian Brzezinka <sebastian.brzezinka@intel.com> Reviewed-by: Krzysztof Karas <krzysztof.karas@intel.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com> Link: https://lore.kernel.org/r/d4c1c14255688dd07cc8044973c4f032a8d1559e.1775038106.git.sebastian.brzezinka@intel.com (cherry picked from commit 13238dc0ee4f9ab8dafa2cca7295736191ae2f42) Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-04-11drm/amd/display: Fix DCE LVDS handlingAlex Deucher6-16/+22
[ Upstream commit 90d239cc53723c1a3f89ce08eac17bf3a9e9f2d4 ] LVDS does not use an HPD pin so it may be invalid. Handle this case correctly in link encoder creation. Fixes: 7c8fb3b8e9ba ("drm/amd/display: Add hpd_source index check for DCE60/80/100/110/112/120 link encoders") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/5012 Cc: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Cc: Roman Li <roman.li@amd.com> Reviewed-by: Roman Li <roman.li@amd.com> Reviewed-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 3b5620f7ee688177fcf65cf61588c5435bce1872) Cc: stable@vger.kernel.org [ removed unrelated VGA connector block absent from stable and split combined null/bounds check into separate guard and ternary ] Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-04-11drm/amd/pm: disable OD_FAN_CURVE if temp or pwm range invalid for smu v13Yang Wang2-2/+64
[ Upstream commit 3e6dd28a11083e83e11a284d99fcc9eb748c321c ] Forcibly disable the OD_FAN_CURVE feature when temperature or PWM range is invalid, otherwise PMFW will reject this configuration on smu v13.0.x example: $ sudo cat /sys/bus/pci/devices/<BDF>/gpu_od/fan_ctrl/fan_curve OD_FAN_CURVE: 0: 0C 0% 1: 0C 0% 2: 0C 0% 3: 0C 0% 4: 0C 0% OD_RANGE: FAN_CURVE(hotspot temp): 0C 0C FAN_CURVE(fan speed): 0% 0% $ echo "0 50 40" | sudo tee fan_curve kernel log: [ 756.442527] amdgpu 0000:03:00.0: amdgpu: Fan curve temp setting(50) must be within [0, 0]! [ 777.345800] amdgpu 0000:03:00.0: amdgpu: Fan curve temp setting(50) must be within [0, 0]! Closes: https://github.com/ROCm/amdgpu/issues/208 Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 470891606c5a97b1d0d937e0aa67a3bed9fcb056) Cc: stable@vger.kernel.org [ adapted forward declaration placement to existing FEATURE_MASK macro ] Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-04-11drm/amdgpu/pm: drop SMU driver if version not matched messagesAlex Deucher3-3/+0
commit a3ffaa5b397f4df9d6ac16b10583e9df8e6fa471 upstream. It just leads to user confusion. Cc: Yang Wang <kevinyang.wang@amd.com> Cc: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit e471627d56272a791972f25e467348b611c31713) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-04-11drm/amdgpu: Change AMDGPU_VA_RESERVED_TRAP_SIZE to 64KBDonet Tom2-3/+3
commit 4487571ef17a30d274600b3bd6965f497a881299 upstream. Currently, AMDGPU_VA_RESERVED_TRAP_SIZE is hardcoded to 8KB, while KFD_CWSR_TBA_TMA_SIZE is defined as 2 * PAGE_SIZE. On systems with 4K pages, both values match (8KB), so allocation and reserved space are consistent. However, on 64K page-size systems, KFD_CWSR_TBA_TMA_SIZE becomes 128KB, while the reserved trap area remains 8KB. This mismatch causes the kernel to crash when running rocminfo or rccl unit tests. Kernel attempted to read user page (2) - exploit attempt? (uid: 1001) BUG: Kernel NULL pointer dereference on read at 0x00000002 Faulting instruction address: 0xc0000000002c8a64 Oops: Kernel access of bad area, sig: 11 [#1] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries CPU: 34 UID: 1001 PID: 9379 Comm: rocminfo Tainted: G E 6.19.0-rc4-amdgpu-00320-gf23176405700 #56 VOLUNTARY Tainted: [E]=UNSIGNED_MODULE Hardware name: IBM,9105-42A POWER10 (architected) 0x800200 0xf000006 of:IBM,FW1060.30 (ML1060_896) hv:phyp pSeries NIP: c0000000002c8a64 LR: c00000000125dbc8 CTR: c00000000125e730 REGS: c0000001e0957580 TRAP: 0300 Tainted: G E MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 24008268 XER: 00000036 CFAR: c00000000125dbc4 DAR: 0000000000000002 DSISR: 40000000 IRQMASK: 1 GPR00: c00000000125d908 c0000001e0957820 c0000000016e8100 c00000013d814540 GPR04: 0000000000000002 c00000013d814550 0000000000000045 0000000000000000 GPR08: c00000013444d000 c00000013d814538 c00000013d814538 0000000084002268 GPR12: c00000000125e730 c000007e2ffd5f00 ffffffffffffffff 0000000000020000 GPR16: 0000000000000000 0000000000000002 c00000015f653000 0000000000000000 GPR20: c000000138662400 c00000013d814540 0000000000000000 c00000013d814500 GPR24: 0000000000000000 0000000000000002 c0000001e0957888 c0000001e0957878 GPR28: c00000013d814548 0000000000000000 c00000013d814540 c0000001e0957888 NIP [c0000000002c8a64] __mutex_add_waiter+0x24/0xc0 LR [c00000000125dbc8] __mutex_lock.constprop.0+0x318/0xd00 Call Trace: 0xc0000001e0957890 (unreliable) __mutex_lock.constprop.0+0x58/0xd00 amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu+0x6fc/0xb60 [amdgpu] kfd_process_alloc_gpuvm+0x54/0x1f0 [amdgpu] kfd_process_device_init_cwsr_dgpu+0xa4/0x1a0 [amdgpu] kfd_process_device_init_vm+0xd8/0x2e0 [amdgpu] kfd_ioctl_acquire_vm+0xd0/0x130 [amdgpu] kfd_ioctl+0x514/0x670 [amdgpu] sys_ioctl+0x134/0x180 system_call_exception+0x114/0x300 system_call_vectored_common+0x15c/0x2ec This patch changes AMDGPU_VA_RESERVED_TRAP_SIZE to 64 KB and KFD_CWSR_TBA_TMA_SIZE to the AMD GPU page size. This means we reserve 64 KB for the trap in the address space, but only allocate 8 KB within it. With this approach, the allocation size never exceeds the reserved area. Fixes: 34a1de0f7935 ("drm/amdkfd: Relocate TBA/TMA to opposite side of VM hole") Reviewed-by: Christian König <christian.koenig@amd.com> Suggested-by: Felix Kuehling <felix.kuehling@amd.com> Suggested-by: Christian König <christian.koenig@amd.com> Signed-off-by: Donet Tom <donettom@linux.ibm.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 31b8de5e55666f26ea7ece5f412b83eab3f56dbb) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-04-11drm/amdgpu: validate doorbell_offset in user queue creationJunrui Luo1-0/+7
commit a018d1819f158991b7308e4f74609c6c029b670c upstream. amdgpu_userq_get_doorbell_index() passes the user-provided doorbell_offset to amdgpu_doorbell_index_on_bar() without bounds checking. An arbitrarily large doorbell_offset can cause the calculated doorbell index to fall outside the allocated doorbell BO, potentially corrupting kernel doorbell space. Validate that doorbell_offset falls within the doorbell BO before computing the BAR index, using u64 arithmetic to prevent overflow. Fixes: f09c1e6077ab ("drm/amdgpu: generate doorbell index for userqueue") Reported-by: Yuhao Jiang <danisjiang@gmail.com> Signed-off-by: Junrui Luo <moonafterrain@outlook.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit de1ef4ffd70e1d15f0bf584fd22b1f28cbd5e2ec) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-04-11drm/amdgpu: Fix wait after reset sequence in S4Lijo Lazar2-3/+8
commit daf470b8882b6f7f53cbfe9ec2b93a1b21528cdc upstream. For a mode-1 reset done at the end of S4 on PSPv11 dGPUs, only check if TOS is unloaded. Fixes: 32f73741d6ee ("drm/amdgpu: Wait for bootloader after PSPv11 reset") Closes: https://gitlab.freedesktop.org/drm/amd/-/work_items/4853 Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 2fb4883b884a437d760bd7bdf7695a7e5a60bba3) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-04-11drm/i915/dp: Use crtc_state->enhanced_framing properly on ivb/hsw CPU eDPVille Syrjälä1-1/+1
commit 9c9a57e4e337f94e23ddf69263fd0685c91155fb upstream. Looks like I missed the drm_dp_enhanced_frame_cap() in the ivb/hsw CPU eDP code when I introduced crtc_state->enhanced_framing. Fix it up so that the state we program to the hardware is guaranteed to match what we computed earlier. Cc: stable@vger.kernel.org Fixes: 3072a24c778a ("drm/i915: Introduce crtc_state->enhanced_framing") Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patch.msgid.link/20260325135849.12603-3-ville.syrjala@linux.intel.com Reviewed-by: Michał Grzelak <michal.grzelak@intel.com> (cherry picked from commit 799fe8dc2af52f35c78c4ac97f8e34994dfd8760) Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-04-11drm/i915/dsi: Don't do DSC horizontal timing adjustments in command modeVille Syrjälä1-2/+2
commit 4dfce79e098915d8e5fc2b9e1d980bc3251dd32c upstream. Stop adjusting the horizontal timing values based on the compression ratio in command mode. Bspec seems to be telling us to do this only in video mode, and this is also how the Windows driver does things. This should also fix a div-by-zero on some machines because the adjusted htotal ends up being so small that we end up with line_time_us==0 when trying to determine the vtotal value in command mode. Note that this doesn't actually make the display on the Huawei Matebook E work, but at least the kernel no longer explodes when the driver loads. Cc: stable@vger.kernel.org Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/12045 Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patch.msgid.link/20260326111814.9800-2-ville.syrjala@linux.intel.com Fixes: 53693f02d80e ("drm/i915/dsi: account for DSC in horizontal timings") Reviewed-by: Jani Nikula <jani.nikula@intel.com> (cherry picked from commit 0b475e91ecc2313207196c6d7fd5c53e1a878525) Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-04-11drm/ast: dp501: Fix initialization of SCU2CThomas Zimmermann1-1/+1
commit 2f42c1a6161646cbd29b443459fd635d29eda634 upstream. Ast's DP501 initialization reads the register SCU2C at offset 0x1202c and tries to set it to source data from VGA. But writes the update to offset 0x0, with unknown results. Write the result to SCU instead. The bug only happens in ast_init_analog(). There's similar code in ast_init_dvo(), which works correctly. Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Fixes: 83c6620bae3f ("drm/ast: initial DP501 support (v0.2)") Reviewed-by: Jocelyn Falempe <jfalempe@redhat.com> Cc: Dave Airlie <airlied@redhat.com> Cc: Thomas Zimmermann <tzimmermann@suse.de> Cc: Jocelyn Falempe <jfalempe@redhat.com> Cc: dri-devel@lists.freedesktop.org Cc: <stable@vger.kernel.org> # v3.16+ Link: https://patch.msgid.link/20260327133532.79696-2-tzimmermann@suse.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-04-11drm/amdgpu: fix the idr allocation flagsPrike Liang1-1/+4
commit 62f553d60a801384336f5867967c26ddf3b17038 upstream. Fix the IDR allocation flags by using atomic GFP flags in non‑sleepable contexts to avoid the __might_sleep() complaint. 268.290239] [drm] Initialized amdgpu 3.64.0 for 0000:03:00.0 on minor 0 [ 268.294900] BUG: sleeping function called from invalid context at ./include/linux/sched/mm.h:323 [ 268.295355] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 1744, name: modprobe [ 268.295705] preempt_count: 1, expected: 0 [ 268.295886] RCU nest depth: 0, expected: 0 [ 268.296072] 2 locks held by modprobe/1744: [ 268.296077] #0: ffff8c3a44abd1b8 (&dev->mutex){....}-{4:4}, at: __driver_attach+0xe4/0x210 [ 268.296100] #1: ffffffffc1a6ea78 (amdgpu_pasid_idr_lock){+.+.}-{3:3}, at: amdgpu_pasid_alloc+0x26/0xe0 [amdgpu] [ 268.296494] CPU: 12 UID: 0 PID: 1744 Comm: modprobe Tainted: G U OE 6.19.0-custom #16 PREEMPT(voluntary) [ 268.296498] Tainted: [U]=USER, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE [ 268.296499] Hardware name: AMD Majolica-RN/Majolica-RN, BIOS RMJ1009A 06/13/2021 [ 268.296501] Call Trace: Fixes: 8f1de51f49be ("drm/amdgpu: prevent immediate PASID reuse case") Tested-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Prike Liang <Prike.Liang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit ea56aa2625708eaf96f310032391ff37746310ef) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-04-11drm/amd/display: Fix NULL pointer dereference in dcn401_init_hw()Srinivasan Shanmugam1-6/+11
commit e927b36ae18b66b49219eaa9f46edc7b4fdbb25e upstream. dcn401_init_hw() assumes that update_bw_bounding_box() is valid when entering the update path. However, the existing condition: ((!fams2_enable && update_bw_bounding_box) || freq_changed) does not guarantee this, as the freq_changed branch can evaluate to true independently of the callback pointer. This can result in calling update_bw_bounding_box() when it is NULL. Fix this by separating the update condition from the pointer checks and ensuring the callback, dc->clk_mgr, and bw_params are validated before use. Fixes the below: ../dc/hwss/dcn401/dcn401_hwseq.c:367 dcn401_init_hw() error: we previously assumed 'dc->res_pool->funcs->update_bw_bounding_box' could be null (see line 362) Fixes: ca0fb243c3bb ("drm/amd/display: Underflow Seen on DCN401 eGPU") Cc: Daniel Sa <Daniel.Sa@amd.com> Cc: Alvin Lee <alvin.lee2@amd.com> Cc: Roman Li <roman.li@amd.com> Cc: Alex Hung <alex.hung@amd.com> Cc: Tom Chung <chiahsuan.chung@amd.com> Cc: Dan Carpenter <dan.carpenter@linaro.org> Cc: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 86117c5ab42f21562fedb0a64bffea3ee5fcd477) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-04-11drm/ioc32: stop speculation on the drm_compat_ioctl pathGreg Kroah-Hartman1-0/+2
commit f8995c2df519f382525ca4bc90553ad2ec611067 upstream. The drm compat ioctl path takes a user controlled pointer, and then dereferences it into a table of function pointers, the signature method of spectre problems. Fix this up by calling array_index_nospec() on the index to the function pointer list. Fixes: 505b5240329b ("drm/ioctl: Fix Spectre v1 vulnerabilities") Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Maxime Ripard <mripard@kernel.org> Cc: Thomas Zimmermann <tzimmermann@suse.de> Cc: David Airlie <airlied@gmail.com> Cc: Simona Vetter <simona@ffwll.ch> Cc: stable <stable@kernel.org> Assisted-by: gkh_clanker_2000 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Thomas Zimmermann <tzimmermann@suse.de> Acked-by: Maxime Ripard <mripard@kernel.org> Reviewed-by: Simona Vetter <simona@ffwll.ch> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patch.msgid.link/2026032451-playing-rummage-8fa2@gregkh Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-04-11drm/sysfb: Fix efidrm error handling and memory type mismatchChen Ni1-15/+31
[ Upstream commit 5e77923a3eb39cce91bf08ed7670f816bf86d4af ] Fix incorrect error checking and memory type confusion in efidrm_device_create(). devm_memremap() returns error pointers, not NULL, and returns system memory while devm_ioremap() returns I/O memory. The code incorrectly passes system memory to iosys_map_set_vaddr_iomem(). Restructure to handle each memory type separately. Use devm_ioremap*() with ERR_PTR(-ENXIO) for WC/UC, and devm_memremap() with ERR_CAST() for WT/WB. Fixes: 32ae90c66fb6 ("drm/sysfb: Add efidrm for EFI displays") Signed-off-by: Chen Ni <nichen@iscas.ac.cn> Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patch.msgid.link/20260311064652.2903449-1-nichen@iscas.ac.cn Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-04-11drm/xe/pxp: Clear restart flag in pxp_start after jumping backDaniele Ceraolo Spurio1-1/+3
[ Upstream commit 76903b2057c8677c2c006e87fede15f496555dc0 ] If we don't clear the flag we'll keep jumping back at the beginning of the function once we reach the end. Fixes: ccd3c6820a90 ("drm/xe/pxp: Decouple queue addition from PXP start") Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Julia Filipchuk <julia.filipchuk@intel.com> Reviewed-by: Julia Filipchuk <julia.filipchuk@intel.com> Link: https://patch.msgid.link/20260324153718.3155504-9-daniele.ceraolospurio@intel.com (cherry picked from commit 0850ec7bb2459602351639dccf7a68a03c9d1ee0) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-04-11drm/xe/pxp: Remove incorrect handling of impossible state during suspendDaniele Ceraolo Spurio1-6/+0
[ Upstream commit 4fed244954c2dc9aafa333d08f66b14345225e03 ] The default case of the PXP suspend switch is incorrectly exiting without releasing the lock. However, this case is impossible to hit because we're switching on an enum and all the valid enum values have their own cases. Therefore, we can just get rid of the default case and rely on the compiler to warn us if a new enum value is added and we forget to add it to the switch. Fixes: 51462211f4a9 ("drm/xe/pxp: add PXP PM support") Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Alan Previn Teres Alexis <alan.previn.teres.alexis@intel.com> Cc: Julia Filipchuk <julia.filipchuk@intel.com> Reviewed-by: Julia Filipchuk <julia.filipchuk@intel.com> Link: https://patch.msgid.link/20260324153718.3155504-8-daniele.ceraolospurio@intel.com (cherry picked from commit f1b5a77fc9b6a90cd9a5e3db9d4c73ae1edfcfac) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-04-11drm/xe/pxp: Clean up termination status on failureDaniele Ceraolo Spurio1-0/+1
[ Upstream commit e2628e670bb0923fcdc00828bfcd67b26a7df020 ] If the PXP HW termination fails during PXP start, the normal completion code won't be called, so the termination will remain uncomplete. To avoid unnecessary waits, mark the termination as completed from the error path. Note that we already do this if the termination fails when handling a termination irq from the HW. Fixes: f8caa80154c4 ("drm/xe/pxp: Add PXP queue tracking and session start") Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Alan Previn Teres Alexis <alan.previn.teres.alexis@intel.com> Cc: Julia Filipchuk <julia.filipchuk@intel.com> Reviewed-by: Julia Filipchuk <julia.filipchuk@intel.com> Link: https://patch.msgid.link/20260324153718.3155504-7-daniele.ceraolospurio@intel.com (cherry picked from commit 5d9e708d2a69ab1f64a17aec810cd7c70c5b9fab) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-04-11Revert "drm: Fix use-after-free on framebuffers and property blobs when ↵Maarten Lankhorst2-10/+4
calling drm_dev_unplug" commit 45ebe43ea00d6b9f5b3e0db9c35b8ca2a96b7e70 upstream. This reverts commit 6bee098b91417654703e17eb5c1822c6dfd0c01d. Den 2026-03-25 kl. 22:11, skrev Simona Vetter: > On Wed, Mar 25, 2026 at 10:26:40AM -0700, Guenter Roeck wrote: >> Hi, >> >> On Fri, Mar 13, 2026 at 04:17:27PM +0100, Maarten Lankhorst wrote: >>> When trying to do a rather aggressive test of igt's "xe_module_load >>> --r reload" with a full desktop environment and game running I noticed >>> a few OOPSes when dereferencing freed pointers, related to >>> framebuffers and property blobs after the compositor exits. >>> >>> Solve this by guarding the freeing in drm_file with drm_dev_enter/exit, >>> and immediately put the references from struct drm_file objects during >>> drm_dev_unplug(). >>> >> >> With this patch in v6.18.20, I get the warning backtraces below. >> The backtraces are gone with the patch reverted. > > Yeah, this needs to be reverted, reasoning below. Maarten, can you please > take care of that and feed the revert through the usual channels? I don't > think it's critical enough that we need to fast-track this into drm.git > directly. > > Quoting the patch here again: > >> drivers/gpu/drm/drm_file.c | 5 ++++- >> drivers/gpu/drm/drm_mode_config.c | 9 ++++++--- >> 2 files changed, 10 insertions(+), 4 deletions(-) >> >> diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c >> index ec820686b3021..f52141f842a1f 100644 >> --- a/drivers/gpu/drm/drm_file.c >> +++ b/drivers/gpu/drm/drm_file.c >> @@ -233,6 +233,7 @@ static void drm_events_release(struct drm_file *file_priv) >> void drm_file_free(struct drm_file *file) >> { >> struct drm_device *dev; >> + int idx; >> >> if (!file) >> return; >> @@ -249,9 +250,11 @@ void drm_file_free(struct drm_file *file) >> >> drm_events_release(file); >> >> - if (drm_core_check_feature(dev, DRIVER_MODESET)) { >> + if (drm_core_check_feature(dev, DRIVER_MODESET) && >> + drm_dev_enter(dev, &idx)) { > > This is misplaced for two reasons: > > - Even if we'd want to guarantee that we hold a drm_dev_enter/exit > reference during framebuffer teardown, we'd need to do this > _consistently over all callsites. Not ad-hoc in just one place that a > testcase hits. This also means kerneldoc updates of the relevant hooks > and at least a bunch of acks from other driver people to document the > consensus. > > - More importantly, this is driver responsibilities in general unless we > have extremely good reasons to the contrary. Which means this must be > placed in xe. > >> drm_fb_release(file); >> drm_property_destroy_user_blobs(dev, file); >> + drm_dev_exit(idx); >> } >> >> if (drm_core_check_feature(dev, DRIVER_SYNCOBJ)) >> diff --git a/drivers/gpu/drm/drm_mode_config.c b/drivers/gpu/drm/drm_mode_config.c >> index 84ae8a23a3678..e349418978f79 100644 >> --- a/drivers/gpu/drm/drm_mode_config.c >> +++ b/drivers/gpu/drm/drm_mode_config.c >> @@ -583,10 +583,13 @@ void drm_mode_config_cleanup(struct drm_device *dev) >> */ >> WARN_ON(!list_empty(&dev->mode_config.fb_list)); >> list_for_each_entry_safe(fb, fbt, &dev->mode_config.fb_list, head) { >> - struct drm_printer p = drm_dbg_printer(dev, DRM_UT_KMS, "[leaked fb]"); >> + if (list_empty(&fb->filp_head) || drm_framebuffer_read_refcount(fb) > 1) { >> + struct drm_printer p = drm_dbg_printer(dev, DRM_UT_KMS, "[leaked fb]"); > > This is also wrong: > > - Firstly, it's a completely independent bug, we do not smash two bugfixes > into one patch. > > - Secondly, it's again a driver bug: drm_mode_cleanup must be called when > the last drm_device reference disappears (hence the existence of > drmm_mode_config_init), not when the driver gets un