summaryrefslogtreecommitdiff
path: root/drivers/gpu
AgeCommit message (Collapse)AuthorFilesLines
2025-08-01drm/shmem-helper: Remove obsoleted is_iomem testDmitry Osipenko1-6/+0
commit eab10538073c3ff9e21c857bd462f79f2f6f7e00 upstream. Everything that uses the mapped buffer should be agnostic to is_iomem. The only reason for the is_iomem test is that we're setting shmem->vaddr to the returned map->vaddr. Now that the shmem->vaddr code is gone, remove the obsoleted is_iomem test to clean up the code. Acked-by: Maxime Ripard <mripard@kernel.org> Suggested-by: Thomas Zimmermann <tzimmermann@suse.de> Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> Acked-by: Thomas Zimmermann <tzimmermann@suse.d> Signed-off-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250322212608.40511-7-dmitry.osipenko@collabora.com Stable-dep-of: 6d496e956998 ("Revert "drm/gem-shmem: Use dma_buf from GEM object instance"") Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-08-01drm/amdgpu: Fix SDMA engine reset with logical instance IDJesse Zhang1-4/+6
commit 09b585592fa481384597c81388733aed4a04dd05 upstream. This commit makes the following improvements to SDMA engine reset handling: 1. Clarifies in the function documentation that instance_id refers to a logical ID 2. Adds conversion from logical to physical instance ID before performing reset using GET_INST(SDMA0, instance_id) macro 3. Improves error messaging to indicate when a logical instance reset fails 4. Adds better code organization with blank lines for readability The change ensures proper SDMA engine reset by using the correct physical instance ID while maintaining the logical ID interface for callers. V2: Remove harvest_config check and convert directly to physical instance (Lijo) Suggested-by: Jonathan Kim <jonathan.kim@amd.com> Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 5efa6217c239ed1ceec0f0414f9b6f6927387dfc) Cc: stable@vger.kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-08-01drm/amdgpu: Implement SDMA soft reset directly for v5.xJesse.zhang@amd.com1-1/+37
commit 5c3e7c49538e2ddad10296a318c225bbb3d37d20 upstream. This patch introduces a new function `amdgpu_sdma_soft_reset` to handle SDMA soft resets directly, rather than relying on the DPM interface. 1. **New `amdgpu_sdma_soft_reset` Function**: - Implements a soft reset for SDMA engines by directly writing to the hardware registers. - Handles SDMA versions 4.x and 5.x separately: - For SDMA 4.x, the existing `amdgpu_dpm_reset_sdma` function is used for backward compatibility. - For SDMA 5.x, the driver directly manipulates the `GRBM_SOFT_RESET` register to reset the specified SDMA instance. 2. **Integration into `amdgpu_sdma_reset_engine`**: - The `amdgpu_sdma_soft_reset` function is called during the SDMA reset process, replacing the previous call to `amdgpu_dpm_reset_sdma`. v2: r should default to an error (Alex) Suggested-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Stable-dep-of: 09b585592fa4 ("drm/amdgpu: Fix SDMA engine reset with logical instance ID") Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-08-01drm/amdgpu: Add the new sdma function pointers for amdgpu_sdma.hJesse.zhang@amd.com1-1/+7
commit 29891842154d7ebca97a94b0d5aaae94e560f61c upstream. This patch introduces new function pointers in the amdgpu_sdma structure to handle queue stop, start and soft reset operations. These will replace the older callback mechanism. The new functions are: - stop_kernel_queue: Stops a specific SDMA queue - start_kernel_queue: Starts/Restores a specific SDMA queue - soft_reset_kernel_queue: Performs soft reset on a specific SDMA queue v2: Update stop_queue/start_queue function paramters to use ring pointer instead of device/instance(Chritian) v3: move stop_queue/start_queue to struct amdgpu_sdma_instance and rename them. (Alex) v4: rework the ordering a bit (Alex) Suggested-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Stable-dep-of: 09b585592fa4 ("drm/amdgpu: Fix SDMA engine reset with logical instance ID") Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-08-01drm/xe: Make WA BB part of LRC BOMatthew Brost2-21/+19
commit afcad92411772a1f361339f22c49f855c6cc7d0f upstream. No idea why, but without this GuC context switches randomly fail when running IGTs in a loop. Need to follow up why this fixes the aforementioned issue but can live with a stable driver for now. Fixes: 617d824c5323 ("drm/xe: Add WA BB to capture active context utilization") Cc: stable@vger.kernel.org Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Tested-by: Shuicheng Lin <shuicheng.lin@intel.com> Link: https://lore.kernel.org/r/20250612031925.4009701-1-matthew.brost@intel.com (cherry picked from commit 3a1edef8f4b58b0ba826bc68bf4bce4bdf59ecf3) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> [ adapted xe_bo_create_pin_map() call ] Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-08-01drm/sched: Remove optimization that causes hang when killing dependent jobsLin.Cao1-19/+2
commit 15f77764e90a713ee3916ca424757688e4f565b9 upstream. When application A submits jobs and application B submits a job with a dependency on A's fence, the normal flow wakes up the scheduler after processing each job. However, the optimization in drm_sched_entity_add_dependency_cb() uses a callback that only clears dependencies without waking up the scheduler. When application A is killed before its jobs can run, the callback gets triggered but only clears the dependency without waking up the scheduler, causing the scheduler to enter sleep state and application B to hang. Remove the optimization by deleting drm_sched_entity_clear_dep() and its usage, ensuring the scheduler is always woken up when dependencies are cleared. Fixes: 777dbd458c89 ("drm/amdgpu: drop a dummy wakeup scheduler") Cc: stable@vger.kernel.org # v4.6+ Signed-off-by: Lin.Cao <lincao12@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Philipp Stanner <phasta@kernel.org> Link: https://lore.kernel.org/r/20250717084453.921097-1-lincao12@amd.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-08-01drm/amdgpu: Reset the clear flag in buddy during resumeArunpravin Paneer Selvam4-0/+63
commit 95a16160ca1d75c66bf7a1c5e0bcaffb18e7c7fc upstream. - Added a handler in DRM buddy manager to reset the cleared flag for the blocks in the freelist. - This is necessary because, upon resuming, the VRAM becomes cluttered with BIOS data, yet the VRAM backend manager believes that everything has been cleared. v2: - Add lock before accessing drm_buddy_clear_reset_blocks()(Matthew Auld) - Force merge the two dirty blocks.(Matthew Auld) - Add a new unit test case for this issue.(Matthew Auld) - Having this function being able to flip the state either way would be good. (Matthew Brost) v3(Matthew Auld): - Do merge step first to avoid the use of extra reset flag. Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Suggested-by: Christian König <christian.koenig@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Cc: stable@vger.kernel.org Fixes: a68c7eaa7a8f ("drm/amdgpu: Enable clear page functionality") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3812 Signed-off-by: Christian König <christian.koenig@amd.com> Link: https://lore.kernel.org/r/20250716075125.240637-2-Arunpravin.PaneerSelvam@amd.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-08-01Revert "drm/gem-dma: Use dma_buf from GEM object instance"Thomas Zimmermann1-1/+1
commit 1918e79be908b8a2c8757640289bc196c14d928a upstream. This reverts commit e8afa1557f4f963c9a511bd2c6074a941c308685. The dma_buf field in struct drm_gem_object is not stable over the object instance's lifetime. The field becomes NULL when user space releases the final GEM handle on the buffer object. This resulted in a NULL-pointer deref. Workarounds in commit 5307dce878d4 ("drm/gem: Acquire references on GEM handles for framebuffers") and commit f6bfc9afc751 ("drm/framebuffer: Acquire internal references on GEM handles") only solved the problem partially. They especially don't work for buffer objects without a DRM framebuffer associated. Hence, this revert to going back to using .import_attach->dmabuf. v3: - cc stable Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Reviewed-by: Simona Vetter <simona.vetter@ffwll.ch> Acked-by: Christian König <christian.koenig@amd.com> Acked-by: Zack Rusin <zack.rusin@broadcom.com> Cc: <stable@vger.kernel.org> # v6.15+ Link: https://lore.kernel.org/r/20250715155934.150656-8-tzimmermann@suse.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-08-01Revert "drm/gem-framebuffer: Use dma_buf from GEM object instance"Thomas Zimmermann1-2/+6
commit 2712ca878b688682ac2ce02aefc413fc76019cd9 upstream. This reverts commit cce16fcd7446dcff7480cd9d2b6417075ed81065. The dma_buf field in struct drm_gem_object is not stable over the object instance's lifetime. The field becomes NULL when user space releases the final GEM handle on the buffer object. This resulted in a NULL-pointer deref. Workarounds in commit 5307dce878d4 ("drm/gem: Acquire references on GEM handles for framebuffers") and commit f6bfc9afc751 ("drm/framebuffer: Acquire internal references on GEM handles") only solved the problem partially. They especially don't work for buffer objects without a DRM framebuffer associated. Hence, this revert to going back to using .import_attach->dmabuf. v3: - cc stable Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Reviewed-by: Simona Vetter <simona.vetter@ffwll.ch> Acked-by: Christian König <christian.koenig@amd.com> Acked-by: Zack Rusin <zack.rusin@broadcom.com> Cc: <stable@vger.kernel.org> # v6.15+ Link: https://lore.kernel.org/r/20250715155934.150656-6-tzimmermann@suse.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-08-01Revert "drm/prime: Use dma_buf from GEM object instance"Thomas Zimmermann1-1/+7
commit fb4ef4a52b79a22ad382bfe77332642d02aef773 upstream. This reverts commit f83a9b8c7fd0557b0c50784bfdc1bbe9140c9bf8. The dma_buf field in struct drm_gem_object is not stable over the object instance's lifetime. The field becomes NULL when user space releases the final GEM handle on the buffer object. This resulted in a NULL-pointer deref. Workarounds in commit 5307dce878d4 ("drm/gem: Acquire references on GEM handles for framebuffers") and commit f6bfc9afc751 ("drm/framebuffer: Acquire internal references on GEM handles") only solved the problem partially. They especially don't work for buffer objects without a DRM framebuffer associated. Hence, this revert to going back to using .import_attach->dmabuf. v3: - cc stable Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Reviewed-by: Simona Vetter <simona.vetter@ffwll.ch> Acked-by: Christian König <christian.koenig@amd.com> Acked-by: Zack Rusin <zack.rusin@broadcom.com> Cc: <stable@vger.kernel.org> # v6.15+ Link: https://lore.kernel.org/r/20250715155934.150656-5-tzimmermann@suse.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-08-01drm/i915/dp: Fix 2.7 Gbps DP_LINK_BW value on g4xVille Syrjälä1-0/+6
commit 9e0c433d0c05fde284025264b89eaa4ad59f0a3e upstream. On g4x we currently use the 96MHz non-SSC refclk, which can't actually generate an exact 2.7 Gbps link rate. In practice we end up with 2.688 Gbps which seems to be close enough to actually work, but link training is currently failing due to miscalculating the DP_LINK_BW value (we calcualte it directly from port_clock which reflects the actual PLL outpout frequency). Ideas how to fix this: - nudge port_clock back up to 270000 during PLL computation/readout - track port_clock and the nominal link rate separately so they might differ a bit - switch to the 100MHz refclk, but that one should be SSC so perhaps not something we want While we ponder about a better solution apply some band aid to the immediate issue of miscalculated DP_LINK_BW value. With this I can again use 2.7 Gbps link rate on g4x. Cc: stable@vger.kernel.org Fixes: 665a7b04092c ("drm/i915: Feed the DPLL output freq back into crtc_state") Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250710201718.25310-2-ville.syrjala@linux.intel.com Reviewed-by: Imre Deak <imre.deak@intel.com> (cherry picked from commit a8b874694db5cae7baaf522756f87acd956e6e66) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-08-01drm/bridge: ti-sn65dsi86: Remove extra semicolon in ti_sn_bridge_probe()Douglas Anderson1-1/+1
[ Upstream commit 15a7ca747d9538c2ad8b0c81dd4c1261e0736c82 ] As reported by the kernel test robot, a recent patch introduced an unnecessary semicolon. Remove it. Fixes: 55e8ff842051 ("drm/bridge: ti-sn65dsi86: Add HPD for DisplayPort connector type") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202506301704.0SBj6ply-lkp@intel.com/ Reviewed-by: Devarsh Thakkar <devarsht@ti.com> Signed-off-by: Douglas Anderson <dianders@chromium.org> Link: https://lore.kernel.org/r/20250714130631.1.I1cfae3222e344a3b3c770d079ee6b6f7f3b5d636@changeid Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-08-01drm/amd/display: Don't allow OLED to go down to fully offMario Limonciello1-5/+7
[ Upstream commit 39d81457ad3417a98ac826161f9ca0e642677661 ] [Why] OLED panels can be fully off, but this behavior is unexpected. [How] Ensure that minimum luminance is at least 1. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4338 Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 51496c7737d06a74b599d0aa7974c3d5a4b1162e) Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-07-24drm/xe: Move page fault init after topology initMatthew Brost1-3/+3
commit 3155ac89251dcb5e35a3ec2f60a74a6ed22c56fd upstream. We need the topology to determine GT page fault queue size, move page fault init after topology init. Cc: stable@vger.kernel.org Fixes: 3338e4f90c14 ("drm/xe: Use topology to determine page fault queue size") Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Reviewed-by: Stuart Summers <stuart.summers@intel.com> Link: https://lore.kernel.org/r/20250710191208.1040215-1-matthew.brost@intel.com (cherry picked from commit beb72acb5b38dbe670d8eb752d1ad7a32f9c4119) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-07-24drm/xe/mocs: Initialize MOCS index earlyBalasubramani Vivekanandan1-2/+2
commit 2a58b21adee3df10ca6f4491af965c4890d2d8e3 upstream. MOCS uc_index is used even before it is initialized in the following callstack guc_prepare_xfer() __xe_guc_upload() xe_guc_min_load_for_hwconfig() xe_uc_init_hwconfig() xe_gt_init_hwconfig() Do MOCS index initialization earlier in the device probe. Signed-off-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com> Reviewed-by: Ravi Kumar Vodapalli <ravi.kumar.vodapalli@intel.com> Link: https://lore.kernel.org/r/20250520142445.2792824-1-balasubramani.vivekanandan@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com> (cherry picked from commit 241cc827c0987d7173714fc5a95a7c8fc9bf15c0) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Stable-dep-of: 3155ac89251d ("drm/xe: Move page fault init after topology init") Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-07-24drm/mediatek: only announce AFBC if really supportedIcenowy Zheng7-4/+27
[ Upstream commit 8d121a82fa564e0c8bd86ce4ec56b2a43b9b016e ] Currently even the SoC's OVL does not declare the support of AFBC, AFBC is still announced to the userspace within the IN_FORMATS blob, which breaks modern Wayland compositors like KWin Wayland and others. Gate passing modifiers to drm_universal_plane_init() behind querying the driver of the hardware block for AFBC support. Fixes: c410fa9b07c3 ("drm/mediatek: Add AFBC support to Mediatek DRM driver") Signed-off-by: Icenowy Zheng <uwu@icenowy.me> Reviewed-by: CK Hu <ck.hu@medaitek.com> Link: https://patchwork.kernel.org/project/linux-mediatek/patch/20250531121140.387661-1-uwu@icenowy.me/ Signed-off-by: Chun-Kuang Hu <chunkuang.hu@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-07-24drm/mediatek: Add wait_event_timeout when disabling planeJason-JH Lin3-0/+39
[ Upstream commit d208261e9f7c66960587b10473081dc1cecbe50b ] Our hardware registers are set through GCE, not by the CPU. DRM might assume the hardware is disabled immediately after calling atomic_disable() of drm_plane, but it is only truly disabled after the GCE IRQ is triggered. Additionally, the cursor plane in DRM uses async_commit, so DRM will not wait for vblank and will free the buffer immediately after calling atomic_disable(). To prevent the framebuffer from being freed before the layer disable settings are configured into the hardware, which can cause an IOMMU fault error, a wait_event_timeout has been added to wait for the ddp_cmdq_cb() callback,indicating that the GCE IRQ has been triggered. Fixes: 2f965be7f900 ("drm/mediatek: apply CMDQ control flow") Signed-off-by: Jason-JH Lin <jason-jh.lin@mediatek.com> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Reviewed-by: CK Hu <ck.hu@mediatek.com> Link: https://patchwork.kernel.org/project/linux-mediatek/patch/20250624113223.443274-1-jason-jh.lin@mediatek.com/ Signed-off-by: Chun-Kuang Hu <chunkuang.hu@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-07-24drm/xe/pf: Resend PF provisioning after GT resetMichal Wajdeczko1-0/+27
[ Upstream commit 5c244eeca57ff4e47e1f60310d059346d1b86b9b ] If we reload the GuC due to suspend/resume or GT reset then we have to resend not only any VFs provisioning data, but also PF configuration, like scheduling parameters (EQ, PT), as otherwise GuC will continue to use default values. Fixes: 411220808cee ("drm/xe/pf: Restart VFs provisioning after GT reset") Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Link: https://lore.kernel.org/r/20250711193316.1920-3-michal.wajdeczko@intel.com (cherry picked from commit 1c38dd6afa4a8ecce28e94da794fd1d205c30f51) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-07-24drm/xe/pf: Prepare to stop SR-IOV support prior GT resetMichal Wajdeczko3-0/+27
[ Upstream commit 81dccec448d204e448ae83e1fe60e8aaeaadadb8 ] As part of the resume or GT reset, the PF driver schedules work which is then used to complete restarting of the SR-IOV support, including resending to the GuC configurations of provisioned VFs. However, in case of short delay between those two actions, which could be seen by triggering a GT reset on the suspened device: $ echo 1 > /sys/kernel/debug/dri/0000:00:02.0/gt0/force_reset this PF worker might be still busy, which lead to errors due to just stopped or disabled GuC CTB communication: [ ] xe 0000:00:02.0: [drm:xe_gt_resume [xe]] GT0: resumed [ ] xe 0000:00:02.0: [drm] GT0: trying reset from force_reset_show [xe] [ ] xe 0000:00:02.0: [drm] GT0: reset queued [ ] xe 0000:00:02.0: [drm] GT0: reset started [ ] xe 0000:00:02.0: [drm:guc_ct_change_state [xe]] GT0: GuC CT communication channel stopped [ ] xe 0000:00:02.0: [drm:guc_ct_send_recv [xe]] GT0: H2G request 0x5503 canceled! [ ] xe 0000:00:02.0: [drm] GT0: PF: Failed to push VF1 12 config KLVs (-ECANCELED) [ ] xe 0000:00:02.0: [drm] GT0: PF: Failed to push VF1 configuration (-ECANCELED) [ ] xe 0000:00:02.0: [drm:guc_ct_change_state [xe]] GT0: GuC CT communication channel disabled [ ] xe 0000:00:02.0: [drm] GT0: PF: Failed to push VF2 12 config KLVs (-ENODEV) [ ] xe 0000:00:02.0: [drm] GT0: PF: Failed to push VF2 configuration (-ENODEV) [ ] xe 0000:00:02.0: [drm] GT0: PF: Failed to push 2 of 2 VFs configurations [ ] xe 0000:00:02.0: [drm:pf_worker_restart_func [xe]] GT0: PF: restart completed While this VFs reprovisioning will be successful during next spin of the worker, to avoid those errors, make sure to cancel restart worker if we are about to trigger next reset. Fixes: 411220808cee ("drm/xe/pf: Restart VFs provisioning after GT reset") Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Link: https://lore.kernel.org/r/20250711193316.1920-2-michal.wajdeczko@intel.com (cherry picked from commit 9f50b729dd61dfb9f4d7c66900d22a7c7353a8c0) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-07-24drm/xe: Dont skip TLB invalidations on VFTejas Upadhyay1-12/+10
[ Upstream commit fd25fa90edcfd4db5bf69c11621021a7cfd11d53 ] Skipping TLB invalidations on VF causing unrecoverable faults. Probable reason for skipping TLB invalidations on SRIOV could be lack of support for instruction MI_FLUSH_DW_STORE_INDEX. Add back TLB flush with some additional handling. Helps in resolving, [ 704.913454] xe 0000:00:02.1: [drm:pf_queue_work_func [xe]] ASID: 0 VFID: 0 PDATA: 0x0d92 Faulted Address: 0x0000000002fa0000 FaultType: 0 AccessType: 1 FaultLevel: 0 EngineClass: 3 bcs EngineInstance: 8 [ 704.913551] xe 0000:00:02.1: [drm:pf_queue_work_func [xe]] Fault response: Unsuccessful -22 V2: - Use Xmas tree (MichalW) Suggested-by: Matthew Brost <matthew.brost@intel.com> Fixes: 97515d0b3ed92 ("drm/xe/vf: Don't emit access to Global HWSP if VF") Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250710045945.1023840-1-tejas.upadhyay@intel.com Signed-off-by: Tejas Upadhyay <tejas.upadhyay@intel.com> (cherry picked from commit b528e896fa570844d654b5a4617a97fa770a1030) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-07-24drm/amd/display: Free memory allocationClayton King1-1/+2
commit b2ee9fa0fe6416e16c532f61b909c79b5d4ed282 upstream. [WHY] Free memory to avoid memory leak Reviewed-by: Joshua Aberback <joshua.aberback@amd.com> Signed-off-by: Clayton King <clayton.king@amd.com> Signed-off-by: Ivan Lipski <ivan.lipski@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit fa699acb8e9be2341ee318077fa119acc7d5f329) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-07-24drm/amd/display: Disable CRTC degamma LUT for DCN401Melissa Wen1-1/+10
commit 97a0f2b5f4d4afcec34376e4428e157ce95efa71 upstream. In DCN401 pre-blending degamma LUT isn't affecting cursor as in previous DCN version. As this is not the behavior close to what is expected for CRTC degamma LUT, disable CRTC degamma LUT property in this HW. Link: https://gitlab.freedesktop.org/drm/amd/-/issues/4176
2025-07-24drm/amdgpu: Increase reset counter only on successLijo Lazar1-2/+7
commit 86790e300d8b7bbadaad5024e308c52f1222128f upstream. Increment the reset counter only if soft recovery succeeded. This is consistent with a ring hard reset behaviour where counter gets incremented only if hard reset succeeded. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 25c314aa3ec3d30e4ee282540e2096b5c66a2437) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-07-24drm/panfrost: Fix scheduler workqueue bugPhilipp Stanner1-1/+1
commit cb345f954eacd162601e7d07ca2f0f0a17b54ee3 upstream. When the GPU scheduler was ported to using a struct for its initialization parameters, it was overlooked that panfrost creates a distinct workqueue for timeout handling. The pointer to this new workqueue is not initialized to the struct, resulting in NULL being passed to the scheduler, which then uses the system_wq for timeout handling. Set the correct workqueue to the init args struct. Cc: stable@vger.kernel.org # 6.15+ Fixes: 796a9f55a8d1 ("drm/sched: Use struct for drm_sched_init() params") Reported-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Closes: https://lore.kernel.org/dri-devel/b5d0921c-7cbf-4d55-aa47-c35cd7861c02@igalia.com/ Signed-off-by: Philipp Stanner <phasta@kernel.org> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Reviewed-by: Steven Price <steven.price@arm.com> Signed-off-by: Steven Price <steven.price@arm.com> Link: https://lore.kernel.org/r/20250709102957.100849-2-phasta@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-07-24drm/amdgpu/gfx8: reset compute ring wptr on the GPU on resumeEeli Haapalainen1-0/+1
commit 83261934015c434fabb980a3e613b01d9976e877 upstream. Commit 42cdf6f687da ("drm/amdgpu/gfx8: always restore kcq MQDs") made the ring pointer always to be reset on resume from suspend. This caused compute rings to fail since the reset was done without also resetting it for the firmware. Reset wptr on the GPU to avoid a disconnect between the driver and firmware wptr. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3911 Fixes: 42cdf6f687da ("drm/amdgpu/gfx8: always restore kcq MQDs") Signed-off-by: Eeli Haapalainen <eeli.haapalainen@protonmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 2becafc319db3d96205320f31cc0de4ee5a93747) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-07-17drm/xe/pm: Correct comment of xe_pm_set_vram_threshold()Shuicheng Lin1-3/+5
[ Upstream commit 0539c5eaf81f3f844213bf6b3137a53e5b04b083 ] The parameter threshold is with size in MiB, not in bits. Correct it to avoid any confusion. v2: s/mb/MiB, s/vram/VRAM, fix return section. (Michal) Fixes: 30c399529f4c ("drm/xe: Document Xe PM component") Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com> Link: https://lore.kernel.org/r/20250708021450.3602087-2-shuicheng.lin@intel.com Reviewed-by: Stuart Summers <stuart.summers@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> (cherry picked from commit 0efec0500117947f924e5ac83be40f96378af85a) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-07-17drm/xe/pm: Restore display pm if there is error after display suspendShuicheng Lin1-2/+1
[ Upstream commit 6d33df611a39a1b4ad9f2b609ded5d6efa04d97e ] xe_bo_evict_all() is called after xe_display_pm_suspend(). So if there is error with xe_bo_evict_all(), display pm should be restored. Fixes: 51462211f4a9 ("drm/xe/pxp: add PXP PM support") Fixes: cb8f81c17531 ("drm/xe/display: Make display suspend/resume work on discrete") Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://lore.kernel.org/r/20250708035424.3608190-2-shuicheng.lin@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> (cherry picked from commit 83dcee17855c4e5af037ae3262809036de127903) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-07-17drm/xe/pf: Clear all LMTT pages on allocMichal Wajdeczko1-0/+11
[ Upstream commit 705a412a367f383430fa34bada387af2e52eb043 ] Our LMEM buffer objects are not cleared by default on alloc and during VF provisioning we only setup LMTT PTEs for the actually provisioned LMEM range. But beyond that valid range we might leave some stale data that could either point to some other VFs allocations or even to the PF pages. Explicitly clear all new LMTT page to avoid the risk that a malicious VF would try to exploit that gap. While around add asserts to catch any undesired PTE overwrites and low-level debug traces to track LMTT PT life-cycle. Fixes: b1d204058218 ("drm/xe/pf: Introduce Local Memory Translation Table") Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Michał Winiarski <michal.winiarski@intel.com> Cc: Lukasz Laguna <lukasz.laguna@intel.com> Reviewed-by: Michał Winiarski <michal.winiarski@intel.com> Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Link: https://lore.kernel.org/r/20250701220052.1612-1-michal.wajdeczko@intel.com (cherry picked from commit 3fae6918a3e27cce20ded2551f863fb05d4bef8d) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-07-17drm/nouveau/gsp: fix potential leak of memory used during acpi initBen Skeggs1-6/+14
[ Upstream commit d133036a0b23d3ef781d067ccdea6bbfb381e0cf ] If any of the ACPI calls fail, memory allocated for the input buffer would be leaked. Fix failure paths to free allocated memory. Also add checks to ensure the allocations succeeded in the first place. Reported-by: Danilo Krummrich <dakr@kernel.org> Fixes: 176fdcbddfd2 ("drm/nouveau/gsp/r535: add support for booting GSP-RM") Signed-off-by: Ben Skeggs <bskeggs@nvidia.com> Signed-off-by: Danilo Krummrich <dakr@kernel.org> Link: https://lore.kernel.org/r/20250617040036.2932-1-bskeggs@nvidia.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-07-17drm/tegra: nvdec: Fix dma_alloc_coherent error checkMikko Perttunen1-4/+2
[ Upstream commit 44306a684cd1699b8562a54945ddc43e2abc9eab ] Check for NULL return value with dma_alloc_coherent, in line with Robin's fix for vic.c in 'drm/tegra: vic: Fix DMA API misuse'. Fixes: 46f226c93d35 ("drm/tegra: Add NVDEC driver") Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com> Signed-off-by: Thierry Reding <treding@nvidia.com> Link: https://lore.kernel.org/r/20250702-nvdec-dma-error-check-v1-1-c388b402c53a@nvidia.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-07-17drm/xe: Allocate PF queue size on pow2 boundaryMatthew Brost1-0/+1
commit c9a95dbe06102cf01afee4cd83ecb29f8d587a72 upstream. CIRC_SPACE does not work unless the size argument is a power of 2, allocate PF queue size on power of 2 boundary. Cc: stable@vger.kernel.org Fixes: 3338e4f90c14 ("drm/xe: Use topology to determine page fault queue size") Fixes: 29582e0ea75c ("drm/xe: Add page queue multiplier") Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Francois Dugast <francois.dugast@intel.com> Link: https://lore.kernel.org/r/20250702213511.3226167-1-matthew.brost@intel.com (cherry picked from commit 491b9783126303755717c0cbde0b08ee59b6abab) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-07-17drm/framebuffer: Acquire internal references on GEM handlesThomas Zimmermann4-26/+61
commit f6bfc9afc7510cb5e6fbe0a17c507917b0120280 upstream. Acquire GEM handles in drm_framebuffer_init() and release them in the corresponding drm_framebuffer_cleanup(). Ties the handle's lifetime to the framebuffer. Not all GEM buffer objects have GEM handles. If not set, no refcounting takes place. This is the case for some fbdev emulation. This is not a problem as these GEM objects do not use dma-bufs and drivers will not release them while fbdev emulation is running. Framebuffer flags keep a bit per color plane of which the framebuffer holds a GEM handle reference. As all drivers use drm_framebuffer_init(), they will now all hold dma-buf references as fixed in commit 5307dce878d4 ("drm/gem: Acquire references on GEM handles for framebuffers"). In the GEM framebuffer helpers, restore the original ref counting on buffer objects. As the helpers for handle refcounting are now no longer called from outside the DRM core, unexport the symbols. v3: - don't mix internal flags with mode flags (Christian) v2: - track framebuffer handle refs by flag - drop gma500 cleanup (Christian) Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Fixes: 5307dce878d4 ("drm/gem: Acquire references on GEM handles for framebuffers") Reported-by: Bert Karwatzki <spasswolf@web.de> Closes: https://lore.kernel.org/dri-devel/20250703115915.3096-1-spasswolf@web.de/ Tested-by: Bert Karwatzki <spasswolf@web.de> Tested-by: Mario Limonciello <superm1@kernel.org> Tested-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: Thomas Zimmermann <tzimmermann@suse.de> Cc: Anusha Srivatsa <asrivats@redhat.com> Cc: Christian König <christian.koenig@amd.com> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Maxime Ripard <mripard@kernel.org> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: "Christian König" <christian.koenig@amd.com> Cc: linux-media@vger.kernel.org Cc: dri-devel@lists.freedesktop.org Cc: linaro-mm-sig@lists.linaro.org Cc: <stable@vger.kernel.org> Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://lore.kernel.org/r/20250707131224.249496-1-tzimmermann@suse.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-07-17drm/nouveau: Do not fail module init on debugfs errorsAaron Thompson3-11/+4
commit 78f88067d5c56d9aed69f27e238742841461cf67 upstream. If CONFIG_DEBUG_FS is enabled, nouveau_drm_init() returns an error if it fails to create the "nouveau" directory in debugfs. One case where that will happen is when debugfs access is restricted by CONFIG_DEBUG_FS_ALLOW_NONE or by the boot parameter debugfs=off, which cause the debugfs APIs to return -EPERM. So just ignore errors from debugfs. Note that nouveau_debugfs_root may be an error now, but that is a standard pattern for debugfs. From include/linux/debugfs.h: "NOTE: it's expected that most callers should _ignore_ the errors returned by this function. Other debugfs functions handle the fact that the "dentry" passed to them could be an error and they don't crash in that case. Drivers should generally work fine even if debugfs fails to init anyway." Fixes: 97118a1816d2 ("drm/nouveau: create module debugfs root") Cc: stable@vger.kernel.org Signed-off-by: Aaron Thompson <dev@aaront.org> Acked-by: Timur Tabi <ttabi@nvidia.com> Signed-off-by: Danilo Krummrich <dakr@kernel.org> Link: https://lore.kernel.org/r/20250703211949.9916-1-dev@aaront.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-07-17Revert "drm/xe/xe2: Enable Indirect Ring State support for Xe2"Matthew Brost1-1/+0
commit daa099fed50a39256feb37d3fac146bf0d74152f upstream. This reverts commit fe0154cf8222d9e38c60ccc124adb2f9b5272371. Seeing some unexplained random failures during LRC context switches with indirect ring state enabled. The failures were always there, but the repro rate increased with the addition of WA BB as a separate BO. Commit 3a1edef8f4b5 ("drm/xe: Make WA BB part of LRC BO") helped to reduce the issues in the context switches, but didn't eliminate them completely. Indirect ring state is not required for any current features, so disable for now until failures can be root caused. Cc: stable@vger.kernel.org Fixes: fe0154cf8222 ("drm/xe/xe2: Enable Indirect Ring State support for Xe2") Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250702035846.3178344-1-matthew.brost@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> (cherry picked from commit 03d85ab36bcbcbe9dc962fccd3f8e54d7bb93b35) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-07-17drm/xe/bmg: fix compressed VRAM handlingMatthew Auld1-1/+1
commit fee58ca135a7b979c8b75e6d2eac60d695f9209b upstream. There looks to be an issue in our compression handling when the BO pages are very fragmented, where we choose to skip the identity map and instead fall back to emitting the PTEs by hand when migrating memory, such that we can hopefully do more work per blit operation. However in such a case we need to ensure the src PTEs are correctly tagged with a compression enabled PAT index on dgpu xe2+, otherwise the copy will simply treat the src memory as uncompressed, leading to corruption if the memory was compressed by the user. To fix this pass along use_comp_pat into emit_pte() on the src side, to indicate that compression should be considered. v2 (Jonathan): tweak the commit message Fixes: 523f191cc0c7 ("drm/xe/xe_migrate: Handle migration logic for xe2+ dgfx") Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Akshata Jahagirdar <akshata.jahagirdar@intel.com> Cc: <stable@vger.kernel.org> # v6.12+ Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Link: https://lore.kernel.org/r/20250701103949.83116-2-matthew.auld@intel.com (cherry picked from commit f7a2fd776e57bd6468644bdecd91ab3aba57ba58) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-07-17drm/gem: Fix race in drm_gem_handle_create_tail()Simona Vetter1-1/+9
commit bd46cece51a36ef088f22ef0416ac13b0a46d5b0 upstream. Object creation is a careful dance where we must guarantee that the object is fully constructed before it is visible to other threads, and GEM buffer objects are no difference. Final publishing happens by calling drm_gem_handle_create(). After that the only allowed thing to do is call drm_gem_object_put() because a concurrent call to the GEM_CLOSE ioctl with a correctly guessed id (which is trivial since we have a linear allocator) can already tear down the object again. Luckily most drivers get this right, the very few exceptions I've pinged the relevant maintainers for. Unfortunately we also need drm_gem_handle_create() when creating additional handles for an already existing object (e.g. GETFB ioctl or the various bo import ioctl), and hence we cannot have a drm_gem_handle_create_and_put() as the only exported function to stop these issues from happening. Now unfortunately the implementation of drm_gem_handle_create() isn't living up to standards: It does correctly finishe object initialization at the global level, and hence is safe against a concurrent tear down. But it also sets up the file-private aspects of the handle, and that part goes wrong: We fully register the object in the drm_file.object_idr before calling drm_vma_node_allow() or obj->funcs->open, which opens up races against concurrent removal of that handle in drm_gem_handle_delete(). Fix this with the usual two-stage approach of first reserving the handle id, and then only registering the object after we've completed the file-private setup. Jacek reported this with a testcase of concurrently calling GEM_CLOSE on a freshly-created object (which also destroys the object), but it should be possible to hit this with just additional handles created through import or GETFB without completed destroying the underlying object with the concurrent GEM_CLOSE ioctl calls. Note that the close-side of this race was fixed in f6cd7daecff5 ("drm: Release driver references to handle before making it available again"), which means a cool 9 years have passed until someone noticed that we need to make this symmetry or there's still gaps left :-/ Without the 2-stage close approach we'd still have a race, therefore that's an integral part of this bugfix. More importantly, this means we can have NULL pointers behind allocated id in our drm_file.object_idr. We need to check for that now: - drm_gem_handle_delete() checks for ERR_OR_NULL already - drm_gem.c:object_lookup() also chekcs for NULL - drm_gem_release() should never be called if there's another thread still existing that could call into an IOCTL that creates a new handle, so cannot race. For paranoia I added a NULL check to drm_gem_object_release_handle() though. - most drivers (etnaviv, i915, msm) are find because they use idr_find(), which maps both ENOENT and NULL to NULL. - drivers using idr_for_each_entry() should also be fine, because idr_get_next does filter out NULL entries and continues the iteration. - The same holds for drm_show_memory_stats(). v2: Use drm_WARN_ON (Thomas) Reported-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com> Tested-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com> Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de> Cc: stable@vger.kernel.org Cc: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Maxime Ripard <mripard@kernel.org> Cc: Thomas Zimmermann <tzimmermann@suse.de> Cc: David Airlie <airlied@gmail.com> Cc: Simona Vetter <simona@ffwll.ch> Signed-off-by: Simona Vetter <simona.vetter@intel.com> Signed-off-by: Simona Vetter <simona.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20250707151814.603897-1-simona.vetter@ffwll.ch Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-07-17drm/ttm: fix error handling in ttm_buffer_object_transferChristian König1-6/+7
commit 97e000acf2e20a86a50a0ec8c2739f0846f37509 upstream. Unlocking the resv object was missing in the error path, additionally to that we should move over the resource only after the fence slot was reserved. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Fixes: c8d4c18bfbc4a ("dma-buf/drivers: make reserving a shared slot mandatory v4") Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20250616130726.22863-3-christian.koenig@amd.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-07-17drm/gem: Acquire references on GEM handles for framebuffersThomas Zimmermann3-11/+51
commit 5307dce878d4126e1b375587318955bd019c3741 upstream. A GEM handle can be released while the GEM buffer object is attached to a DRM framebuffer. This leads to the release of the dma-buf backing the buffer object, if any. [1] Trying to use the framebuffer in further mode-setting operations leads to a segmentation fault. Most easily happens with driver that use shadow planes for vmap-ing the dma-buf during a page flip. An example is shown below. [ 156.791968] ------------[ cut here ]------------ [ 156.796830] WARNING: CPU: 2 PID: 2255 at drivers/dma-buf/dma-buf.c:1527 dma_buf_vmap+0x224/0x430 [...] [ 156.942028] RIP: 0010:dma_buf_vmap+0x224/0x430 [ 157.043420] Call Trace: [ 157.045898] <TASK> [ 157.048030] ? show_trace_log_lvl+0x1af/0x2c0 [ 157.052436] ? show_trace_log_lvl+0x1af/0x2c0 [ 157.056836] ? show_trace_log_lvl+0x1af/0x2c0 [ 157.061253] ? drm_gem_shmem_vmap+0x74/0x710 [ 157.065567] ? dma_buf_vmap+0x224/0x430 [ 157.069446] ? __warn.cold+0x58/0xe4 [ 157.073061] ? dma_buf_vmap+0x224/0x430 [ 157.077111] ? report_bug+0x1dd/0x390 [ 157.080842] ? handle_bug+0x5e/0xa0 [ 157.084389] ? exc_invalid_op+0x14/0x50 [ 157.088291] ? asm_exc_invalid_op+0x16/0x20 [ 157.092548] ? dma_buf_vmap+0x224/0x430 [ 157.096663] ? dma_resv_get_singleton+0x6d/0x230 [ 157.101341] ? __pfx_dma_buf_vmap+0x10/0x10 [ 157.105588] ? __pfx_dma_resv_get_singleton+0x10/0x10 [ 157.110697] drm_gem_shmem_vmap+0x74/0x710 [ 157.114866] drm_gem_vmap+0xa9/0x1b0 [ 157.118763] drm_gem_vmap_unlocked+0x46/0xa0 [ 157.123086] drm_gem_fb_vmap+0xab/0x300 [ 157.126979] drm_atomic_helper_prepare_planes.part.0+0x487/0xb10 [ 157.133032] ? lockdep_init_map_type+0x19d/0x880 [ 157.137701] drm_atomic_helper_commit+0x13d/0x2e0 [ 157.142671] ? drm_atomic_nonblocking_commit+0xa0/0x180 [ 157.147988] drm_mode_atomic_ioctl+0x766/0xe40 [...] [ 157.346424] ---[ end trace 0000000000000000 ]--- Acquiring GEM handles for the framebuffer's GEM buffer objects prevents this from happening. The framebuffer's cleanup later puts the handle references. Commit 1a148af06000 ("drm/gem-shmem: Use dma_buf from GEM object instance") triggers the segmentation fault easily by using the dma-buf field more widely. The underlying issue with reference counting has been present before. v2: - acquire the handle instead of the BO (Christian) - fix comment style (Christian) - drop the Fixes tag (Christian) - rename err_ gotos - add missing L