summaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/amd/amdgpu
AgeCommit message (Collapse)AuthorFilesLines
8 daysdrm/amdgpu: fix incorrect vm flags to map boJack Xiao1-2/+2
[ Upstream commit 040bc6d0e0e9c814c9c663f6f1544ebaff6824a8 ] It should use vm flags instead of pte flags to specify bo vm attributes. Fixes: 7946340fa389 ("drm/amdgpu: Move csa related code to separate file") Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Reviewed-by: Likun Gao <Likun.Gao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit b08425fa77ad2f305fe57a33dceb456be03b653f) Signed-off-by: Sasha Levin <sashal@kernel.org>
8 daysdrm/amdgpu: fix vram reservation issueYiPeng Chai1-2/+1
[ Upstream commit 10ef476aad1c848449934e7bec2ab2374333c7b6 ] The vram block allocation flag must be cleared before making vram reservation, otherwise reserving addresses within the currently freed memory range will always fail. Fixes: c9cad937c0c5 ("drm/amdgpu: add drm buddy support to amdgpu") Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit d38eaf27de1b8584f42d6fb3f717b7ec44b3a7a1) Signed-off-by: Sasha Levin <sashal@kernel.org>
8 daysdrm/amdgpu: clear pa and mca record counter when resetting eepromganglxie1-0/+2
[ Upstream commit d0cc8d2b7df1848f98f0fea8135ba706814b1d13 ] clear pa and mca record counter when resetting eeprom, so that ras_num_bad_pages can be calculated correctly Signed-off-by: ganglxie <ganglxie@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
8 daysdrm/amdgpu: Suspend IH during mode-2 resetLijo Lazar1-4/+29
[ Upstream commit 3f1e81ecb61923934bd11c3f5c1e10893574e607 ] On multi-aid SOCs, there could be a continuous stream of interrupts from GC after poison consumption. Suspend IH to disable them before doing mode-2 reset. This avoids conflicts in hardware accesses during interrupt handlers while a reset is ongoing. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
8 daysdrm/amdgpu: Use correct severity for BP threshold exceed eventXiang Liu1-2/+4
[ Upstream commit 4a33ca3f6ee9a013a423a867426704e9c9d785bd ] The severity of CPER for BP threshold exceed event should be set as CPER_SEV_FATAL to match the OOB implementation. Signed-off-by: Xiang Liu <xiang.liu@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
13 daysdrm/amdgpu/gfx10: fix kiq locking in KCQ resetAlex Deucher1-4/+2
[ Upstream commit a4b2ba8f631d3e44b30b9b46ee290fbfe608b7d0 ] The ring test needs to be inside the lock. Fixes: 097af47d3cfb ("drm/amdgpu/gfx10: wait for reset done before remap") Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: Jiadong Zhu <Jiadong.Zhu@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
13 daysdrm/amdgpu/gfx9.4.3: fix kiq locking in KCQ resetAlex Deucher1-2/+1
[ Upstream commit 08f116c59310728ea8b7e9dc3086569006c861cf ] The ring test needs to be inside the lock. Fixes: 4c953e53cc34 ("drm/amdgpu/gfx_9.4.3: wait for reset done before remap") Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: Jiadong Zhu <Jiadong.Zhu@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
13 daysdrm/amdgpu/gfx9: fix kiq locking in KCQ resetAlex Deucher1-1/+1
[ Upstream commit 730ea5074dac1b105717316be5d9c18b09829385 ] The ring test needs to be inside the lock. Fixes: fdbd69486b46 ("drm/amdgpu/gfx9: wait for reset done before remap") Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: Jiadong Zhu <Jiadong.Zhu@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
13 daysdrm/amdgpu: Remove nbiov7.9 replay count reportingLijo Lazar1-20/+0
[ Upstream commit 0f566f0e9c614aa3d95082246f5b8c9e8a09c8b3 ] Direct pcie replay count reporting is not available on nbio v7.9. Reporting is done through firmware. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Acked-by: Mangesh Gadre <Mangesh.Gadre@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Fixes: 50709d18f4a6 ("drm/amdgpu: Add pci replay count to nbio v7.9") Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-08-01drm/amdgpu: Fix SDMA engine reset with logical instance IDJesse Zhang1-4/+6
commit 09b585592fa481384597c81388733aed4a04dd05 upstream. This commit makes the following improvements to SDMA engine reset handling: 1. Clarifies in the function documentation that instance_id refers to a logical ID 2. Adds conversion from logical to physical instance ID before performing reset using GET_INST(SDMA0, instance_id) macro 3. Improves error messaging to indicate when a logical instance reset fails 4. Adds better code organization with blank lines for readability The change ensures proper SDMA engine reset by using the correct physical instance ID while maintaining the logical ID interface for callers. V2: Remove harvest_config check and convert directly to physical instance (Lijo) Suggested-by: Jonathan Kim <jonathan.kim@amd.com> Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 5efa6217c239ed1ceec0f0414f9b6f6927387dfc) Cc: stable@vger.kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-08-01drm/amdgpu: Implement SDMA soft reset directly for v5.xJesse.zhang@amd.com1-1/+37
commit 5c3e7c49538e2ddad10296a318c225bbb3d37d20 upstream. This patch introduces a new function `amdgpu_sdma_soft_reset` to handle SDMA soft resets directly, rather than relying on the DPM interface. 1. **New `amdgpu_sdma_soft_reset` Function**: - Implements a soft reset for SDMA engines by directly writing to the hardware registers. - Handles SDMA versions 4.x and 5.x separately: - For SDMA 4.x, the existing `amdgpu_dpm_reset_sdma` function is used for backward compatibility. - For SDMA 5.x, the driver directly manipulates the `GRBM_SOFT_RESET` register to reset the specified SDMA instance. 2. **Integration into `amdgpu_sdma_reset_engine`**: - The `amdgpu_sdma_soft_reset` function is called during the SDMA reset process, replacing the previous call to `amdgpu_dpm_reset_sdma`. v2: r should default to an error (Alex) Suggested-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Stable-dep-of: 09b585592fa4 ("drm/amdgpu: Fix SDMA engine reset with logical instance ID") Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-08-01drm/amdgpu: Add the new sdma function pointers for amdgpu_sdma.hJesse.zhang@amd.com1-1/+7
commit 29891842154d7ebca97a94b0d5aaae94e560f61c upstream. This patch introduces new function pointers in the amdgpu_sdma structure to handle queue stop, start and soft reset operations. These will replace the older callback mechanism. The new functions are: - stop_kernel_queue: Stops a specific SDMA queue - start_kernel_queue: Starts/Restores a specific SDMA queue - soft_reset_kernel_queue: Performs soft reset on a specific SDMA queue v2: Update stop_queue/start_queue function paramters to use ring pointer instead of device/instance(Chritian) v3: move stop_queue/start_queue to struct amdgpu_sdma_instance and rename them. (Alex) v4: rework the ordering a bit (Alex) Suggested-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Stable-dep-of: 09b585592fa4 ("drm/amdgpu: Fix SDMA engine reset with logical instance ID") Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-08-01drm/amdgpu: Reset the clear flag in buddy during resumeArunpravin Paneer Selvam3-0/+20
commit 95a16160ca1d75c66bf7a1c5e0bcaffb18e7c7fc upstream. - Added a handler in DRM buddy manager to reset the cleared flag for the blocks in the freelist. - This is necessary because, upon resuming, the VRAM becomes cluttered with BIOS data, yet the VRAM backend manager believes that everything has been cleared. v2: - Add lock before accessing drm_buddy_clear_reset_blocks()(Matthew Auld) - Force merge the two dirty blocks.(Matthew Auld) - Add a new unit test case for this issue.(Matthew Auld) - Having this function being able to flip the state either way would be good. (Matthew Brost) v3(Matthew Auld): - Do merge step first to avoid the use of extra reset flag. Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Suggested-by: Christian König <christian.koenig@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Cc: stable@vger.kernel.org Fixes: a68c7eaa7a8f ("drm/amdgpu: Enable clear page functionality") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3812 Signed-off-by: Christian König <christian.koenig@amd.com> Link: https://lore.kernel.org/r/20250716075125.240637-2-Arunpravin.PaneerSelvam@amd.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-07-24drm/amdgpu: Increase reset counter only on successLijo Lazar1-2/+7
commit 86790e300d8b7bbadaad5024e308c52f1222128f upstream. Increment the reset counter only if soft recovery succeeded. This is consistent with a ring hard reset behaviour where counter gets incremented only if hard reset succeeded. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 25c314aa3ec3d30e4ee282540e2096b5c66a2437) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-07-24drm/amdgpu/gfx8: reset compute ring wptr on the GPU on resumeEeli Haapalainen1-0/+1
commit 83261934015c434fabb980a3e613b01d9976e877 upstream. Commit 42cdf6f687da ("drm/amdgpu/gfx8: always restore kcq MQDs") made the ring pointer always to be reset on resume from suspend. This caused compute rings to fail since the reset was done without also resetting it for the firmware. Reset wptr on the GPU to avoid a disconnect between the driver and firmware wptr. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3911 Fixes: 42cdf6f687da ("drm/amdgpu/gfx8: always restore kcq MQDs") Signed-off-by: Eeli Haapalainen <eeli.haapalainen@protonmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 2becafc319db3d96205320f31cc0de4ee5a93747) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-07-17drm/amdkfd: add hqd_sdma_get_doorbell callbacks for gfx7/8Alex Deucher2-0/+16
commit 34659c1a1f4fd4c148ab13e13b11fd64df01ffcd upstream. These were missed when support was added for other generations. The callbacks are called unconditionally so we need to make sure all generations have them. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4304 Link: https://github.com/ROCm/ROCm/issues/4965 Fixes: bac38ca8c475 ("drm/amdkfd: implement per queue sdma reset for gfx 9.4+") Cc: Jonathan Kim <jonathan.kim@amd.com> Reported-by: Johl Brown <johlbrown@gmail.com> Reviewed-by: Jonathan Kim <jonathan.kim@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 1e9d17a5dcf1242e9518e461d8e63ad35240e49e) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-07-17drm/amdgpu: Include sdma_4_4_4.binKent Russell1-0/+1
commit e54c5de901ea56fc68f8d56b3cce9940169346f4 upstream. This got missed during SDMA 4.4.4 support. Fixes: 968e3811c3e8 ("drm/amdgpu: add initial support for sdma444") Signed-off-by: Kent Russell <kent.russell@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 51526efe02714339ed6139f7bc348377d363200a) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-07-06drm/amdgpu/mes: add missing locking in helper functionsAlex Deucher1-0/+16
[ Upstream commit 40f970ba7a4ab77be2ffe6d50a70416c8876496a ] We need to take the MES lock. Reviewed-by: Michael Chen <michael.chen@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-07-06drm/amdgpu: add kicker fws loading for gfx11/smu13/psp13Frank Min4-6/+26
commit 854171405e7f093532b33d8ed0875b9e34fc55b4 upstream. 1. Add kicker firmwares loading for gfx11/smu13/psp13 2. Register additional MODULE_FIRMWARE entries for kicker fws - gc_11_0_0_rlc_kicker.bin - gc_11_0_0_imu_kicker.bin - psp_13_0_0_sos_kicker.bin - psp_13_0_0_ta_kicker.bin - smu_13_0_0_kicker.bin Signed-off-by: Frank Min <Frank.Min@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit fb5ec2174d70a8989bc207d257db90ffeca3b163) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-07-06drm/amdgpu: switch job hw_fence to amdgpu_fenceAlex Deucher6-32/+32
commit ebe43542702c3d15d1a1d95e8e13b1b54076f05a upstream. Use the amdgpu fence container so we can store additional data in the fence. This also fixes the start_time handling for MCBP since we were casting the fence to an amdgpu_fence and it wasn't. Fixes: 3f4c175d62d8 ("drm/amdgpu: MCBP based on DRM scheduler (v9)") Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit bf1cd14f9e2e1fdf981eed273ddd595863f5288c) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-07-06drm/amdgpu: Fix SDMA UTC_L1 handling during start/stop sequencesJesse Zhang1-1/+5
commit 7f3b16f3f229e37cc3e02e9e4e7106c523b119e9 upstream. This commit makes two key fixes to SDMA v4.4.2 handling: 1. disable UTC_L1 in sdma_cntl register when stopping SDMA engines by reading the current value before modifying UTC_L1_ENABLE bit. 2. Ensure UTC_L1_ENABLE is consistently managed by: - Adding the missing register write when enabling UTC_L1 during start - Keeping UTC_L1 enabled by default as per hardware requirements v2: Correct SDMA_CNTL setting (Philip) Suggested-by: Jonathan Kim <jonathan.kim@amd.com> Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 375bf564654e85a7b1b0657b191645b3edca1bda) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-07-06drm/amdgpu: Add kicker device detectionFrank Min2-0/+23
commit 0bbf5fd86c585d437b75003f11365b324360a5d6 upstream. 1. add kicker device list 2. add kicker device checking helper function Signed-off-by: Frank Min <Frank.Min@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 09aa2b408f4ab689c3541d22b0968de0392ee406) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-07-06drm/amdgpu: VCN v5_0_1 to prevent FW checking RB during DPG pauseSonny Jiang1-1/+5
commit 46e15197b513e60786a44107759d6ca293d6288c upstream. Add a protection to ensure programming are all complete prior VCPU starting. This is a WA for an unintended VCPU running. Signed-off-by: Sonny Jiang <sonny.jiang@amd.com> Acked-by: Leo Liu <leo.liu@amd.com> Reviewed-by: Ruijing Dong <ruijing.dong@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit c29521b529fa5e225feaf709d863a636ca0cbbfa) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-07-06drm/amdgpu: disable workload profile switching when OD is enabledAlex Deucher1-0/+8
commit 5cccf10f652122a17b40df9d672ccf2ed69cd82f upstream. Users have reported that they have to reduce the level of undervolting to acheive stability when dynamic workload profiles are enabled on GC 10.3.x. Disable dynamic workload profiles if the user has enabled OD. Fixes: b9467983b774 ("drm/amdgpu: add dynamic workload profile switching for gfx10") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4262 Reviewed-by: Kenneth Feng <kenneth.feng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.15.x Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-07-06drm/amdgpu: amdgpu_vram_mgr_new(): Clamp lpfn to total vramJohn Olender1-1/+1
commit 4d2f6b4e4c7ed32e7fa39fcea37344a9eab99094 upstream. The drm_mm allocator tolerated being passed end > mm->size, but the drm_buddy allocator does not. Restore the pre-buddy-allocator behavior of allowing such placements. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3448 Signed-off-by: John Olender <john.olender@gmail.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-07-06drm/amd: Adjust output for discovery error handlingMario Limonciello1-15/+13
[ Upstream commit 73eab78721f7b85216f1ca8c7b732f13213b5b32 ] commit 017fbb6690c2 ("drm/amdgpu/discovery: check ip_discovery fw file available") added support for reading an amdgpu IP discovery bin file for some specific products. If it's not found then it will fallback to hardcoded values. However if it's not found there is also a lot of noise about missing files and errors. Adjust the error handling to decrease most messages to DEBUG and to show users less about missing files. Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Reported-by: Marcus Seyfarth <m.seyfarth@gmail.com> Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4312 Tested-by: Marcus Seyfarth <m.seyfarth@gmail.com> Fixes: 017fbb6690c2 ("drm/amdgpu/discovery: check ip_discovery fw file available") Acked-by: Alex Deucher <alexander.deucher@amd.com> Link: https://lore.kernel.org/r/20250617183052.1692059-1-superm1@kernel.org Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 49f1f9f6c3c9febf8ba93f94a8d9c8d03e1ea0a1) Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-07-06drm/amdgpu/mes: add compatibility checks for set_hw_resource_1Alex Deucher2-5/+8
commit 99579c55c3d6132a5236926652c0a72a526b809d upstream. Seems some older MES firmware versions do not properly support this packet. Add back some the compatibility checks. v2: switch to fw version check (Shaoyun) Fixes: f81cd793119e ("drm/amd/amdgpu: Fix MES init sequence") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4295 Cc: Shaoyun Liu <shaoyun.liu@amd.com> Reviewed-by: shaoyun.liu <shaoyun.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 0180e0a5dd5c6ff118043ee42dbbbddaf881f283) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-07-06drm/amdgpu: seq64 memory unmap uses uninterruptible lockPhilip Yang1-1/+1
[ Upstream commit a359288ccb4dd8edb086e7de8fdf6e36f544c922 ] To unmap and free seq64 memory when drm node close to free vm, if there is signal accepted, then taking vm lock failed and leaking seq64 va mapping, and then dmesg has error log "still active bo inside vm". Change to use uninterruptible lock fix the mapping leaking and no dmesg error log. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-07-06drm/amdgpu/vcn2.5: read back register after writtenDavid (Ming Qiang) Wu1-0/+19
[ Upstream commit d9e688b9148bb23629d32017344888dd67ec2ab1 ] The addition of register read-back in VCN v2.5 is intended to prevent potential race conditions. Reviewed-by: Ruijing Dong <ruijing.dong@amd.com> Signed-off-by: David (Ming Qiang) Wu <David.Wu3@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-07-06drm/amdgpu/vcn3: read back register after writtenDavid (Ming Qiang) Wu1-0/+20
[ Upstream commit b7a4842a917e3a251b5a6aa1a21a5daf6d396ef3 ] The addition of register read-back in VCN v3.0 is intended to prevent potential race conditions. Reviewed-by: Ruijing Dong <ruijing.dong@amd.com> Signed-off-by: David (Ming Qiang) Wu <David.Wu3@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-07-06drm/amdgpu/vcn4: read back register after writtenDavid (Ming Qiang) Wu1-0/+20
[ Upstream commit a3810a5e37c58329aa2c7992f3172a423f4ae194 ] The addition of register read-back in VCN v4.0.0 is intended to prevent potential race conditions. Reviewed-by: Ruijing Dong <ruijing.dong@amd.com> Signed-off-by: David (Ming Qiang) Wu <David.Wu3@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-07-06drm/amdgpu/vcn5.0.1: read back register after writtenDavid (Ming Qiang) Wu1-0/+15
[ Upstream commit bf394d28548c3c0a01e113fdef20ddb6cd2df106 ] The addition of register read-back in VCN v5.0.1 is intended to prevent potential race conditions. Reviewed-by: Ruijing Dong <ruijing.dong@amd.com> Signed-off-by: David (Ming Qiang) Wu <David.Wu3@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-06-27drm/amdgpu: Add indirect L1_TLB_CNTL reg programming for VFsVictor Skvortsov5-21/+93
[ Upstream commit 0c6e39ce6da20104900b11bad64464a12fb47320 ] VFs on some IP versions are unable to access this register directly. This register must be programmed before PSP ring is setup, so use PSP VF mailbox directly. PSP will broadcast the register value to all VF assigned instances. Signed-off-by: Victor Skvortsov <victor.skvortsov@amd.com> Reviewed-by: Zhigang Luo <Zhigang.luo@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-06-27drm/amdgpu/gfx9: fix CSIB handlingAlex Deucher1-2/+0
[ Upstream commit a4a4c0ae6742ec7d6bf1548d2c6828de440814a0 ] We shouldn't return after the last section. We need to update the rest of the CSIB. Reviewed-by: Rodrigo Siqueira <siqueira@igalia.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-06-27drm/amdgpu/gfx8: fix CSIB handlingAlex Deucher1-2/+0
[ Upstream commit c8b8d7a4f1c5cdfbd61d75302fb3e3cdefb1a7ab ] We shouldn't return after the last section. We need to update the rest of the CSIB. Reviewed-by: Rodrigo Siqueira <siqueira@igalia.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-06-27drm/amdgpu: Disallow partition query during resetLijo Lazar2-0/+14
[ Upstream commit 75f138db48c5c493f0ac198c2579d52fc6a4c4a0 ] Reject queries to get current partition modes during reset. Also, don't accept sysfs interface requests to switch compute partition mode while in reset. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-06-27drm/amdgpu: fix MES GFX maskArvind Yadav4-9/+26
[ Upstream commit 9d3afcb7b9f950b9b7c58ceeeb9e71f3476e69ed ] Current MES GFX mask prevents FW to enable oversubscription. This patch does the following: - Fixes the mask values and adds a description for the same - Removes the central mask setup and makes it IP specific, as it would be different when the number of pipes and queues are different. v2: squash in fix from Shashank Cc: Christian König <Christian.Koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com> Signed-off-by: Arvind Yadav <arvind.yadav@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-06-27drm/amdgpu/gfx7: fix CSIB handlingAlex Deucher1-2/+0
[ Upstream commit be7652c23d833d1ab2c67b16e173b1a4e69d1ae6 ] We shouldn't return after the last section. We need to update the rest of the CSIB. Reviewed-by: Rodrigo Siqueira <siqueira@igalia.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-06-27drm/amdgpu/gfx10: fix CSIB handlingAlex Deucher1-2/+0
[ Upstream commit 683308af030cd9b8d3f1de5cbc1ee51788878feb ] We shouldn't return after the last section. We need to update the rest of the CSIB. Reviewed-by: Rodrigo Siqueira <siqueira@igalia.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-06-27drm/amdgpu: Add basic validation for RAS headerLijo Lazar1-3/+19
[ Upstream commit 5df0d6addb7e9b6f71f7162d1253762a5be9138e ] If RAS header read from EEPROM is corrupted, it could result in trying to allocate huge memory for reading the records. Add some validation to header fields. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-06-27drm/amdgpu/gfx11: fix CSIB handlingAlex Deucher1-2/+0
[ Upstream commit a9a8bccaa3ba64d509cf7df387cf0b5e1cd06499 ] We shouldn't return after the last section. We need to update the rest of the CSIB. Reviewed-by: Rodrigo Siqueira <siqueira@igalia.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-06-27drm/amdkfd: Drop workaround for GC v9.4.3 revID 0Apurv Mishra2-13/+9
[ Upstream commit daafa303d19f5522e4c24fbf5c1c981a16df2c2f ] Remove workaround code for the early engineering samples GC v9.4.3 SOCs with revID 0 Reviewed-by: Amber Lin <Amber.Lin@amd.com> Signed-off-by: Apurv Mishra <Apurv.Mishra@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-06-27drm/amdgpu: Fix API status offset for MES queue resetJesse.Zhang2-2/+2
[ Upstream commit ad7c088e31f026d71fe87fd09473fafb7d6ed006 ] The mes_v11_0_reset_hw_queue and mes_v12_0_reset_hw_queue functions were using the wrong union type (MESAPI__REMOVE_QUEUE) when getting the offset for api_status. Since these functions handle queue reset operations, they should use MESAPI__RESET union instead. This fixes the polling of API status during hardware queue reset operations in the MES for both v11 and v12 versions. Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Reviewed-By: Shaoyun.liu <Shaoyun.liu@amd.com> Reviewed-by: Prike Liang <Prike.Liang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-06-27drm/amdgpu/gfx6: fix CSIB handlingAlex Deucher1-2/+0
[ Upstream commit 8307ebc15c1ea98a8a0b7837af1faa6c01514577 ] We shouldn't return after the last section. We need to update the rest of the CSIB. Reviewed-by: Rodrigo Siqueira <siqueira@igalia.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-06-19drm/amdgpu/gfx10: Refine Cleaner Shader for GFX10.1.10Vitaly Prosyak2-10/+9
[ Upstream commit d26625d034fb8d596f0488472969493fa02d03f3 ] This patch updates the cleaner shader, which is responsible for initializing GPU resources such as Local Data Share (LDS), Vector General Purpose Registers (VGPRs), and Scalar General Purpose Registers (SGPRs). Changes include adjustments to register clearing and shader configuration. - Updated GPU resource initialization addresses in the cleaner shader from `be803080` to `be803000`. - Simplified the logic in the SGPR clearing section, ensuring all SGPRs are set to zero. Fixes: 25961bad9212 ("drm/amdgpu/gfx10: Add cleaner shader for GFX10.1.10") Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Manu Rastogi <manu.rastogi@amd.com> Signed-off-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-06-19drm/amdgpu: Refine Cleaner Shader MEC firmware version for GFX10.1.x GPUsSrinivasan Shanmugam1-1/+1
[ Upstream commit d30f61076268fd7ce01e4ec9e4d84bfaf90365f7 ] Update the minimum firmware version for the Cleaner Shader in the gfx_v10_0_sw_init function. This change adjusts the minimum required firmware version for the MEC firmware from 152 to 151, allowing for broader compatibility with GFX10.1 GPUs. Fixes: 25961bad9212 ("drm/amdgpu/gfx10: Add cleaner shader for GFX10.1.10") Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-05-14drm/amdgpu: read back register after written for VCN v4.0.5David (Ming Qiang) Wu1-0/+8
On VCN v4.0.5 there is a race condition where the WPTR is not updated after starting from idle when doorbell is used. Adding register read-back after written at function end is to ensure all register writes are done before they can be used. Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12528 Signed-off-by: David (Ming Qiang) Wu <David.Wu3@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Tested-by: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Ruijing Dong <ruijing.dong@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 07c9db090b86e5211188e1b351303fbc673378cf) Cc: stable@vger.kernel.org
2025-05-13drm/amdgpu: fix incorrect MALL size for GFX1151Tim Huang1-0/+12
On GFX1151, the reported MALL cache size reflects only half of its actual size; this adjustment corrects the discrepancy. Signed-off-by: Tim Huang <tim.huang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 0a5c060b593ad152318f89e5564bfdfcff8a6ac0) Cc: stable@vger.kernel.org
2025-05-13drm/amdgpu: csa unmap use uninterruptible lockPhilip Yang1-1/+1
After process exit to unmap csa and free GPU vm, if signal is accepted and then waiting to take vm lock is interrupted and return, it causes memory leaking and below warning backtrace. Change to use uninterruptible wait lock fix the issue. WARNING: CPU: 69 PID: 167800 at amd/amdgpu/amdgpu_kms.c:1525 amdgpu_driver_postclose_kms+0x294/0x2a0 [amdgpu] Call Trace: <TASK> drm_file_free.part.0+0x1da/0x230 [drm] drm_close_helper.isra.0+0x65/0x70 [drm] drm_release+0x6a/0x120 [drm] amdgpu_drm_release+0x51/0x60 [amdgpu] __fput+0x9f/0x280 ____fput+0xe/0x20 task_work_run+0x67/0xa0 do_exit+0x217/0x3c0 do_group_exit+0x3b/0xb0 get_signal+0x14a/0x8d0 arch_do_signal_or_restart+0xde/0x100 exit_to_user_mode_loop+0xc1/0x1a0 exit_to_user_mode_prepare+0xf4/0x100 syscall_exit_to_user_mode+0x17/0x40 do_syscall_64+0x69/0xc0 Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 7dbbfb3c171a6f63b01165958629c9c26abf38ab) Cc: stable@vger.kernel.org
2025-05-08drm/amdgpu/hdp7: use memcfg register to post the write for HDP flushAlex Deucher1-1/+6
Reading back the remapped HDP flush register seems to cause problems on some platforms. All we need is a read, so read back the memcfg register. Fixes: 689275140cb8 ("drm/amdgpu/hdp7.0: do a posting read when flushing HDP") Reported-by: Alexey Klimov <alexey.klimov@linaro.org> Link: https://lists.freedesktop.org/archives/amd-gfx/2025-April/123150.html Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4119 Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3908 Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit dbc064adfcf9095e7d895bea87b2f75c1ab23236) Cc: stable@vger.kernel.org