summaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
AgeCommit message (Collapse)AuthorFilesLines
35 hoursdrm/amdkfd: Don't clear PT after process killedPhilip Yang1-0/+4
commit 597eb70f7ff7551ff795cd51754b81aabedab67b upstream. If process is killed. the vm entity is stopped, submit pt update job will trigger the error message "*ERROR* Trying to push to a killed entity", job will not execute. Suggested-by: Christian König <christian.koenig@amd.com> Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 10c382ec6c6d1e11975a11962bec21cba6360391) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
35 hoursdrm/amdgpu: fix nullptr err of vm_handle_movedHeng Zhou1-1/+14
[ Upstream commit 859958a7faefe5b7742b7b8cdbc170713d4bf158 ] If a amdgpu_bo_va is fpriv->prt_va, the bo of this one is always NULL. So, such kind of amdgpu_bo_va should be updated separately before amdgpu_vm_handle_moved. Signed-off-by: Heng Zhou <Heng.Zhou@amd.com> Reviewed-by: Kasiviswanathan, Harish <Harish.Kasiviswanathan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-10-23drm/amdgpu: use atomic functions with memory barriers for vm fault infoGui-Dong Han1-3/+2
commit 6df8e84aa6b5b1812cc2cacd6b3f5ccbb18cda2b upstream. The atomic variable vm_fault_info_updated is used to synchronize access to adev->gmc.vm_fault_info between the interrupt handler and get_vm_fault_info(). The default atomic functions like atomic_set() and atomic_read() do not provide memory barriers. This allows for CPU instruction reordering, meaning the memory accesses to vm_fault_info and the vm_fault_info_updated flag are not guaranteed to occur in the intended order. This creates a race condition that can lead to inconsistent or stale data being used. The previous implementation, which used an explicit mb(), was incomplete and inefficient. It failed to account for all potential CPU reorderings, such as the access of vm_fault_info being reordered before the atomic_read of the flag. This approach is also more verbose and less performant than using the proper atomic functions with acquire/release semantics. Fix this by switching to atomic_set_release() and atomic_read_acquire(). These functions provide the necessary acquire and release semantics, which act as memory barriers to ensure the correct order of operations. It is also more efficient and idiomatic than using explicit full memory barriers. Fixes: b97dfa27ef3a ("drm/amdgpu: save vm fault information for amdkfd") Cc: stable@vger.kernel.org Signed-off-by: Gui-Dong Han <hanguidong02@gmail.com> Signed-off-by: Felix Kuehling <felix.kuehling@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-10-19drm/amdkfd: Fix kfd process ref leaking when userptr unmappingPhilip Yang1-2/+7
[ Upstream commit 58e6fc2fb94f0f409447e5d46cf6a417b6397fbc ] kfd_lookup_process_by_pid hold the kfd process reference to ensure it doesn't get destroyed while sending the segfault event to user space. Calling kfd_lookup_process_by_pid as function parameter leaks the kfd process refcount and miss the NULL pointer check if app process is already destroyed. Fixes: 2d274bf7099b ("amd/amdkfd: Trigger segfault for early userptr unmmapping") Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-09-09amd/amdkfd: correct mem limit calculation for small APUsYifan Zhang1-12/+32
Current mem limit check leaks some GTT memory (reserved_for_pt reserved_for_ras + adev->vram_pin_size) for small APUs. Since carveout VRAM is tunable on APUs, there are three case regarding the carveout VRAM size relative to GTT: 1. 0 < carveout < gtt apu_prefer_gtt = true, is_app_apu = false 2. carveout > gtt / 2 apu_prefer_gtt = false, is_app_apu = false 3. 0 = carveout apu_prefer_gtt = true, is_app_apu = true It doesn't make sense to check below limitation in case 1 (default case, small carveout) because the values in the below expression are mixed with carveout and gtt. adev->kfd.vram_used[xcp_id] + vram_needed > vram_size - reserved_for_pt - reserved_for_ras - atomic64_read(&adev->vram_pin_size) gtt: kfd.vram_used, vram_needed, vram_size carveout: reserved_for_pt, reserved_for_ras, adev->vram_pin_size In case 1, vram allocation will go to gtt domain, skip vram check since ttm_mem_limit check already cover this allocation. Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit fa7c99f04f6dd299388e9282812b14e95558ac8e)
2025-05-07amd/amdkfd: Trigger segfault for early userptr unmmappingShane Xiao1-0/+12
If applications unmap the memory before destroying the userptr, it needs trigger a segfault to notify user space to correct the free sequence in VM debug mode. v2: Send gpu access fault to user space v3: Report gpu address to user space, remove unnecessary params v4: update pr_err into one line, remove userptr log info Signed-off-by: Shane Xiao <shane.xiao@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-21drm/amdgpu: use GFP_NOWAIT for memory allocationsChristian König1-4/+4
In the critical submission path memory allocations can't wait for reclaim since that can potentially wait for submissions to finish. Finally clean that up and mark most memory allocations in the critical path with GFP_NOWAIT. The only exception left is the dma_fence_array() used when no VMID is available, but that will be cleaned up later on. Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-05drm/amdkfd: remove unnecessary cpu domain validationJames Zhu1-6/+0
before move to GTT domain. Signed-off-by: James Zhu <James.Zhu@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-21drm/amdgpu: remove all KFD fences from the BO on releaseChristian König1-30/+22
Remove all KFD BOs from the private dma_resv object. This prevents the KFD from being evict unecessarily when an exported BO is released. Signed-off-by: Christian König <christian.koenig@amd.com> Signed-off-by: James Zhu <James.Zhu@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Reviewed-and-tested-by: James Zhu <James.Zhu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-12drm/amdkfd: Fix pasid value leakXiaogang Chen1-21/+0
Curret kfd does not allocate pasid values, instead uses pasid value for each vm from graphic driver. So should not prevent graphic driver from releasing pasid values since the values are allocated by graphic driver, not kfd driver anymore. This patch does not stop graphic driver release pasid values. Fixes: 8544374c0f82 ("drm/amdkfd: Have kfd driver use same PASID values from graphic driver") Signed-off-by: Xiaogang Chen <xiaogang.chen@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-12drm/amdkfd: add a new flag to manage where VRAM allocations goAlex Deucher1-8/+8
On big and small APUs we send KFD VRAM allocations to GTT since the carve out is either non-existent or relatively small. However, if someone sets the carve out size to be relatively large, we may end up using GTT rather than VRAM. No change of logic with this patch, but it allows the driver to determine which logic to use based on the carve out size in the future. Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-12drm/amdkfd: Have kfd driver use same PASID values from graphic driverXiaogang Chen1-21/+0
Current kfd driver has its own PASID value for a kfd process and uses it to locate vm at interrupt handler or mapping between kfd process and vm. That design is not working when a physical gpu device has multiple spatial partitions, ex: adev in CPX mode. This patch has kfd driver use same pasid values that graphic driver generated which is per vm per pasid. These pasid values are passed to fw/hardware. We do not need change interrupt handler though more pasid values are used. Also, pasid values at log are replaced by user process pid; pasid values are not exposed to user. Users see their process pids that have meaning in user space. Signed-off-by: Xiaogang Chen <xiaogang.chen@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-12-18drm/amdgpu: Failed to check various return codeAndrew Martin1-7/+7
Clean up code to quiet the compiler on us failing to check the return code. Signed-off-by: Andrew Martin <Andrew.Martin@amd.com> Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-22drm/amdkfd: Not restore userptr buffer if kfd process has been removedXiaogang Chen1-4/+7
When kfd process has been terminated not restore userptr buffer after mmu notifier invalidates a range. Signed-off-by: Xiaogang Chen <xiaogang.chen@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-07drm/amdkfd: Fix an eviction fence leakLang Yu1-2/+2
Only creating a new reference for each process instead of each VM. Fixes: 9a1c1339abf9 ("drm/amdkfd: Run restore_workers on freezable WQs") Suggested-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Lang Yu <lang.yu@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-09-18drm/amdgpu: remove amdgpu_pin_restricted()Christian König1-1/+1
We haven't used the functionality to pin BOs in a certain range at all while the driver existed. Just nuke it. Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-09-10drm/amdgpu: fix a race in kfd_mem_export_dmabuf()Al Viro1-9/+3
Using drm_gem_prime_handle_to_fd() to set dmabuf up and insert it into descriptor table, only to have it looked up by file descriptor and remove it from descriptor table is not just too convoluted - it's racy; another thread might have modified the descriptor table while we'd been going through that song and dance. Switch kfd_mem_export_dmabuf() to using drm_gem_prime_handle_to_dmabuf() and leave the descriptor table alone... Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-23drm/amdkfd: Ensure user queue buffers residencyPhilip Yang1-2/+12
Add atomic queue_refcount to struct bo_va, return -EBUSY to fail unmap BO from the GPU if the bo_va queue_refcount is not zero. Create queue to increase the bo_va queue_refcount, destroy queue to decrease the bo_va queue_refcount, to ensure the queue buffers mapped on the GPU when queue is active. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-23drm/amdkfd: Refactor queue wptr_bo GART mappingPhilip Yang1-2/+3
Add helper function kfd_queue_acquire_buffers to get queue wptr_bo reference from queue write_ptr if it is mapped to the KFD node with expected size. Add wptr_bo to structure queue_properties because structure queue is allocated after queue buffers are validated, then we can remove wptr_bo parameter from pqm_create_queue. Rename structure queue wptr_bo_gart to hold wptr_bo reference for GART mapping and umapping. Move MES wptr_bo_gart mapping to init_user_queue, the same location with queue ctx_bo GART mapping. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-23drm/amdkfd: kfd_bo_mapped_dev support partitionPhilip Yang1-2/+3
Change amdgpu_amdkfd_bo_mapped_to_dev to use drm_priv as parameter instead of adev, to support spatial partition. This is only used by CRIU checkpoint restore now. No functional change. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-05drm/amdgpu: Estimate RAS reservation when report capacity v2Hawking Zhang1-2/+7
Add estimate of how much vram we need to reserve for RAS when caculating the total available vram. v2: apply the change to MP0 v13_0_2 and v13_0_14 Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-29drm/amdkfd: simplify APU VRAM handlingAlex Deucher1-8/+8
With commit 89773b85599a ("drm/amdkfd: Let VRAM allocations go to GTT domain on small APUs") big and small APU "VRAM" handling in KFD was unified. Since AMD_IS_APU is set for both big and small APUs, we can simplify the checks in the code. v2: clean up a few more places (Lang) Acked-by: Felix Kuehling <felix.kuehling@amd.com> Reviewed-by: Lang Yu <Lang.Yu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-17drm/kfd: Correct pinned buffer handling at kfd restore and validate processXiaogang Chen1-4/+5
This reverts commit 8a774fe912ff ("drm/amdgpu: avoid restore process run into dead loop") since buffer got pinned is not related whether it needs mapping And skip buffer validation at kfd driver if the buffer has been pinned. Signed-off-by: Xiaogang Chen <Xiaogang.Chen@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-13drm/amdkfd: Remove arbitrary timeout for hmm_range_faultPhilip Yang1-1/+4
On system with khugepaged enabled and user cases with THP buffer, the hmm_range_fault may takes > 15 seconds to return -EBUSY, the arbitrary timeout value is not accurate, cause memory allocation failure. Remove the arbitrary timeout value, return EAGAIN to application if hmm_range_fault return EBUSY, then userspace libdrm and Thunk will call ioctl again. Change EAGAIN to debug message as this is not error. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-08drm/amdkfd: Let VRAM allocations go to GTT domain on small APUsLang Yu1-9/+11
Small APUs(i.e., consumer, embedded products) usually have a small carveout device memory which can't satisfy most compute workloads memory allocation requirements. We can't even run a Basic MNIST Example with a default 512MB carveout. https://github.com/pytorch/examples/tree/main/mnist. Error Log: "torch.cuda.OutOfMemoryError: HIP out of memory. Tried to allocate 84.00 MiB. GPU 0 has a total capacity of 512.00 MiB of which 0 bytes is free. Of the allocated memory 103.83 MiB is allocated by PyTorch, and 22.17 MiB is reserved by PyTorch but unallocated" Though we can change BIOS settings to enlarge carveout size, which is inflexible and may bring complaint. On the other hand, the memory resource can't be effectively used between host and device. The solution is MI300A approach, i.e., let VRAM allocations go to GTT. Then device and host can flexibly and effectively share memory resource. v2: Report local_mem_size_private as 0. (Felix) Signed-off-by: Lang Yu <Lang.Yu@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-02drm/amdkfd: handle duplicate BOs in reserve_bo_and_cond_vmsLang Yu1-1/+2
Observed on gfx8 ASIC where KFD_IOC_ALLOC_MEM_FLAGS_AQL_QUEUE_MEM is used. Two attachments use the same VM, root PD would be locked twice. [ 57.910418] Call Trace: [ 57.793726] ? reserve_bo_and_cond_vms+0x111/0x1c0 [amdgpu] [ 57.793820] amdgpu_amdkfd_gpuvm_unmap_memory_from_gpu+0x6c/0x1c0 [amdgpu] [ 57.793923] ? idr_get_next_ul+0xbe/0x100 [ 57.793933] kfd_process_device_free_bos+0x7e/0xf0 [amdgpu] [ 57.794041] kfd_process_wq_release+0x2ae/0x3c0 [amdgpu] [ 57.794141] ? process_scheduled_works+0x29c/0x580 [ 57.794147] process_scheduled_works+0x303/0x580 [ 57.794157] ? __pfx_worker_thread+0x10/0x10 [ 57.794160] worker_thread+0x1a2/0x370 [ 57.794165] ? __pfx_worker_thread+0x10/0x10 [ 57.794167] kthread+0x11b/0x150 [ 57.794172] ? __pfx_kthread+0x10/0x10 [ 57.794177] ret_from_fork+0x3d/0x60 [ 57.794181] ? __pfx_kthread+0x10/0x10 [ 57.794184] ret_from_fork_asm+0x1b/0x30 Signed-off-by: Lang Yu <Lang.Yu@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-30drm/amdkfd: Evict BO itself for contiguous allocationPhilip Yang1-1/+18
If the BO pages pinned for RDMA is not contiguous on VRAM, evict it to system memory first to free the VRAM space, then allocate contiguous VRAM space, and then move it from system memory back to VRAM. v6: user context should use interruptible call (Felix) Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-30drm/amdgpu: Support contiguous VRAM allocationPhilip Yang1-0/+4
RDMA device with limited scatter-gather ability requires contiguous VRAM buffer allocation for RDMA peer direct support. Add a new KFD alloc memory flag and store as bo alloc flag AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS. When pin this bo to export for RDMA peerdirect access, this will set TTM_PL_FLAG_CONTIFUOUS flag, and ask VRAM buddy allocator to get contiguous VRAM. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-26drm/amdgpu: Fix VRAM memory accountingMukul Joshi1-1/+1
Subtract the VRAM pinned memory when checking for available memory in amdgpu_amdkfd_reserve_mem_limit function since that memory is not available for use. Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-18drm/amdkfd: make sure VM is ready for updating operationsLang Yu1-14/+20
When page table BOs were evicted but not validated before updating page tables, VM is still in evicting state, amdgpu_vm_update_range returns -EBUSY and restore_process_worker runs into a dead loop. v2: Split the BO validation and page table update into two separate loops in amdgpu_amdkfd_restore_process_bos. (Felix) 1.Validate BOs 2.Validate VM (and DMABuf attachments) 3.Update page tables for the BOs validated above Fixes: 50661eb1a2c8 ("drm/amdgpu: Auto-validate DMABuf imports in compute VMs") Signed-off-by: Lang Yu <Lang.Yu@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-18drm/amdgpu: Fix leak when GPU memory allocation failsMukul Joshi1-0/+1
Free the sync object if the memory allocation fails for any reason. Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-03-20drm/amdgpu: Handle duplicate BOs during process restoreMukul Joshi1-4/+10
In certain situations, some apps can import a BO multiple times (through IPC for example). To restore such processes successfully, we need to tell drm to ignore duplicate BOs. While at it, also add additional logging to prevent silent failures when process restore fails. Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-28amd/amdkfd: remove unused parameterEric Huang1-2/+1
The adev can be found from bo by amdgpu_ttm_adev(bo->tbo.bdev), and adev is also not used in the function amdgpu_amdkfd_map_gtt_bo_to_gart(). Signed-off-by: Eric Huang <jinhuieric.huang@amd.com> Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-31drm/amdkfd: reserve the BO before validating itLang Yu1-3/+17
Fix a warning. v2: Avoid unmapping attachment repeatedly when ERESTARTSYS. v3: Lock the BO before accessing ttm->sg to avoid race conditions.(Felix) [ 41.708711] WARNING: CPU: 0 PID: 1463 at drivers/gpu/drm/ttm/ttm_bo.c:846 ttm_bo_validate+0x146/0x1b0 [ttm] [ 41.708989] Call Trace: [ 41.708992] <TASK> [ 41.708996] ? show_regs+0x6c/0x80 [ 41.709000] ? ttm_bo_validate+0x146/0x1b0 [ttm] [ 41.709008] ? __warn+0x93/0x190 [ 41.709014] ? ttm_bo_validate+0x146/0x1b0 [ttm] [ 41.709024] ? report_bug+0x1f9/0x210 [ 41.709035] ? handle_bug+0x46/0x80 [ 41.709041] ? exc_invalid_op+0x1d/0x80 [ 41.709048] ? asm_exc_invalid_op+0x1f/0x30 [ 41.709057] ? amdgpu_amdkfd_gpuvm_dmaunmap_mem+0x2c/0x80 [amdgpu] [ 41.709185] ? ttm_bo_validate+0x146/0x1b0 [ttm] [ 41.709197] ? amdgpu_amdkfd_gpuvm_dmaunmap_mem+0x2c/0x80 [amdgpu] [ 41.709337] ? srso_alias_return_thunk+0x5/0x7f [ 41.709346] kfd_mem_dmaunmap_attachment+0x9e/0x1e0 [amdgpu] [ 41.709467] amdgpu_amdkfd_gpuvm_dmaunmap_mem+0x56/0x80 [amdgpu] [ 41.709586] kfd_ioctl_unmap_memory_from_gpu+0x1b7/0x300 [amdgpu] [ 41.709710] kfd_ioctl+0x1ec/0x650 [amdgpu] [ 41.709822] ? __pfx_kfd_ioctl_unmap_memory_from_gpu+0x10/0x10 [amdgpu] [ 41.709945] ? srso_alias_return_thunk+0x5/0x7f [ 41.709949] ? tomoyo_file_ioctl+0x20/0x30 [ 41.709959] __x64_sys_ioctl+0x9c/0xd0 [ 41.709967] do_syscall_64+0x3f/0x90 [ 41.709973] entry_SYSCALL_64_after_hwframe+0x6e/0xd8 Fixes: 101b8104307e ("drm/amdkfd: Move dma unmapping after TLB flush") Signed-off-by: Lang Yu <Lang.Yu@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-15drm/amdgpu: Auto-validate DMABuf imports in compute VMsFelix Kuehling1-17/+22
DMABuf imports in compute VMs are not wrapped in a kgd_mem object on the process_info->kfd_bo_list. There is no explicit KFD API call to validate them or add eviction fences to them. This patch automatically validates and fences dymanic DMABuf imports when they are added to a compute VM. Revalidation after evictions is handled in the VM code. v2: * Renamed amdgpu_vm_validate_evicted_bos to amdgpu_vm_validate * Eliminated evicted_user state, use evicted state for VM BOs and user BOs * Fixed and simplified amdgpu_vm_fence_imports, depends on reserved BOs * Moved dma_resv_reserve_fences for amdgpu_vm_fence_imports into amdgpu_vm_validate, outside the vm->status_lock * Added dummy version of amdgpu_amdkfd_bo_validate_and_fence for builds without KFD v4: Eliminate amdgpu_vm_fence_imports. It's not needed because the reservation with its fences is shared with the export, as long as all imports are from KFD, with the exports already reserved, validated and fenced by the KFD restore worker. v5: Reintroduced separate evicted_user state to simplify the state machine and CS error handling when amdgpu_vm_validate is called without a ticket. Signed-off-by: Felix Kuehling <felix.kuehling@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-09drm/amdkfd: Fix sparse __rcu annotation warningsFelix Kuehling1-2/+2
Properly mark kfd_process->ef as __rcu and consistently use the right accessor functions. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202312052245.yFpBSgNH-lkp@intel.com/ Signed-off-by: Felix Kuehling <felix.kuehling@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-12-20Merge tag 'drm-msm-next-2023-12-15' of ↵Dave Airlie1-4/+4
https://gitlab.freedesktop.org/drm/msm into drm-next Updates for v6.8: Core: - Add support for SDM670, SM8650 - Handle the CFG interconnect to fix the obscure hangs / timeouts on register write - Kconfig fix for QMP dependency - DT schema fixes DPU: - Add support for SDM670, SM8650 - Enable SmartDMA on SM8350 and SM8450 - Correct UBWC settings for SC8280XP - Fix catalog settings for SC8180X - Actually make use of the version to switch between QSEED3/3LITE/4 scalers - Use devres-managed and drm-managed allocations where appropriate - misc other fixes - Enabled YUV writeback on SC7280, SM8250 - Enabled writeback on SM8350, SM8450 - CRC fix when encoder is selected as the input source - other misc fixes MDP4: - Use devres-managed and drm-managed allocations where appropriate - flush vblank event on CRTC disable MDP5: - Use devres-managed and drm-managed allocations where appropriate DP: - Add support for SM8650 - Enable PM runtime support - Merge msm-specific debugfs dir with the generic one - Described DisplayPort on SM8150 in DeviceTree bindings - Moved dp_display_get_next_bridge() to probe() DSI: - Add support for SM8650 - Enable PM runtime support GPU/GEM: - demote userspace triggerable warnings to debug - add GEM object metadata UAPI - move GPU devcoredumps to GPU device - fix hangcheck to skip retired submits - expose UBWC config to userspace - fix a680 chip-id - drm_exec conversion - drm/ci: remove rebase-merge directory (to unblock CI) [airlied: fix drm_exec/amd interaction] Signed-off-by: Dave Airlie <airlied@redhat.com> From: Rob Clark <robdclark@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/CAF6AEGs9auYqmo-7NSd9FsbNBCDf7aBevd=4xkcF3A5G_OGvMQ@mail.gmail.com
2023-12-13drm/amdkfd: Import DMABufs for interop through DRMFelix Kuehling1-20/+44
Use drm_gem_prime_fd_to_handle to import DMABufs for interop. This ensures that a GEM handle is created on import and that obj->dma_buf will be set and remain set as long as the object is imported into KFD. Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Ramesh Errabolu <Ramesh.Errabolu@amd.com> Reviewed-by: Xiaogang.Chen <Xiaogang.Chen@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-12-13drm/amdkfd: Export DMABufs from KFD using GEM handlesFelix Kuehling1-7/+26
Create GEM handles for exporting DMABufs using GEM-Prime APIs. The GEM handles are created in a drm_client_dev context to avoid exposing them in user mode contexts through a DMABuf import. Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Ramesh Errabolu <Ramesh.Errabolu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-12-10drm/exec: Pass in initial # of objectsRob Clark1-4/+4
In cases where the # is known ahead of time, it is silly to do the table resize dance. Signed-off-by: Rob Clark <robdclark@chromium.org> Reviewed-by: Christian König <christian.koenig@amd.com> Patchwork: https://patchwork.freedesktop.org/patch/568338/
2023-11-29drm/amdkfd: Run restore_workers on freezable WQsFelix Kuehling1-19/+49
Make restore workers freezable so we don't have to explicitly flush them in suspend and GPU reset code paths, and we don't accidentally try to restore BOs while the GPU is suspended. Not having to flush restore_work also helps avoid lock/fence dependencies in the GPU reset case where we're not allowed to wait for fences. A side effect of this is, that we can now have multiple concurrent threads trying to signal the same eviction fence. Rework eviction fence signaling and replacement to account for that. The GPU reset path can no longer rely on restore_process_worker to resume queues because evict/restore workers can run independently of it. Instead call a new restore_process_helper directly. This is an RFC and request for testing. v2: - Reworked eviction fence signaling - Introduced restore_process_helper v3: - Handle unsignaled eviction fences in restore_process_bos Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Tested-by: Emily Deng <Emily.Deng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-17drm/amdgpu: update mappings not managed by KFDFelix Kuehling1-6/+22
When restoring after an eviction, use amdgpu_vm_handle_moved to update BO VA mappings in KFD VMs that are not managed through the KFD API. This should allow using the render node API to create more flexible memory mappings in KFD VMs. v2: rebase on drm_exec changes (Alex) Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-07Merge tag 'drm-next-2023-11-07' of git://anongit.freedesktop.org/drm/drmLinus Torvalds1-31/+48
Pull more drm updates from Dave Airlie: "Geert pointed out I missed the renesas reworks in my main pull, so this pull contains the renesas next work for atomic conversion and DT support. It also contains a bunch of amdgpu and some small ssd13xx fixes. renesas: - atomic conversion - DT support ssd13xx: - dt binding fix for ssd132x - Initialize ssd130x crtc_state to NULL. amdgpu: - Fix RAS support check - RAS fixes - MES fixes - SMU13 fixes - Contiguous memory allocation fix - BACO fixes - GPU reset fixes - Min power limit fixes - GFX11 fixes - USB4/TB hotplug fixes - ARM regression fix - GFX9.4.3 fixes - KASAN/KCSAN stack size check fixes - SR-IOV fixes - SMU14 fixes - PSP13 fixes - Display blend fixes - Flexible array size fixes amdkfd: - GPUVM fix radeon: - Flexible array size fixes" * tag 'drm-next-2023-11-07' of git://anongit.freedesktop.org/drm/drm: (83 commits) drm/amd/display: Enable fast update on blendTF change drm/amd/display: Fix blend LUT programming drm/amd/display: Program plane color setting correctly drm/amdgpu: Query and report boot status drm/amdgpu: Add psp v13 function to query boot status drm/amd/swsmu: remove fw version check in sw_init. drm/amd/swsmu: update smu v14_0_0 driver if and metrics table drm/amdgpu: Add C2PMSG_109/126 reg field shift/masks drm/amdgpu: Optimize the asic type fix code drm/amdgpu: fix GRBM read timeout when do mes_self_test drm/amdgpu: check recovery status of xgmi hive in ras_reset_error_count drm/amd/pm: only check sriov vf flag once when creating hwmon sysfs drm/amdgpu: Attach eviction fence on alloc drm/amdkfd: Improve amdgpu_vm_handle_moved drm/amd/display: Increase frame warning limit with KASAN or KCSAN in dml2 drm/amd/display: Avoid NULL dereference of timing generator drm/amdkfd: Update cache info for GFX 9.4.3 drm/amdkfd: Populate cache info for GFX 9.4.3 drm/amdgpu: don't put MQDs in VRAM on ARM | ARM64 drm/amdgpu/smu13: drop compute workload workaround ...
2023-11-03drm/amdgpu: Attach eviction fence on allocFelix Kuehling1-31/+48
Instead of attaching the eviction fence when a KFD BO is first mapped, attach it when it is allocated or imported. This in preparation to allow KFD BOs to be mapped using the render node API. Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-01Merge tag 'drm-next-2023-10-31-1' of git://anongit.freedesktop.org/drm/drmLinus Torvalds1-11/+46
Pull drm updates from Dave Airlie: "Highlights: - AMD adds some more upcoming HW platforms - Intel made Meteorlake stable and started adding Lunarlake - nouveau has a bunch of display rework in prepartion for the NVIDIA GSP firmware support - msm adds a7xx support - habanalabs has finished migration to accel subsystem Detail summary: kernel: - add initial vmemdup-user-array core: - fix platform remove() to return void - drm_file owner updated to reflect owner - move size calcs to drm buddy allocator - let GPUVM build as a module - allow variable number of run-queues in scheduler edid: - handle bad h/v sync_end in EDIDs panfrost: - add Boris as maintainer fbdev: - use fb_ops helpers more - only allow logo use from fbcon - rename fb_pgproto to pgprot_framebuffer - add HPD state to drm_connector_oob_hotplug_event - convert to fbdev i/o mem helpers i915: - Enable meteorlake by default - Early Xe2 LPD/Lunarlake display enablement - Rework subplatforms into IP version checks - GuC based TLB invalidation for Meteorlake - Display rework for future Xe driver integration - LNL FBC features - LNL display feature capability reads - update recommended fw versions for DG2+ - drop fastboot module parameter - added deviceid for Arrowlake-S - drop preproduction workarounds - don't disable preemption for resets - cleanup inlines in headers - PXP firmware loading fix - Fix sg list lengths - DSC PPS state readout/verification - Add more RPL P/U PCI IDs - Add new DG2-G12 stepping - DP enhanced framing support to state checker - Improve shared link bandwidth management - stop using GEM macros in display code - refactor related code into display code - locally enable W=1 warnings - remove PSR watchdog timers on LNL amdgpu: - RAS/FRU EEPROM updatse - IP discovery updatses - GC 11.5 support - DCN 3.5 support - VPE 6.1 support - NBIO 7.11 support - DML2 support - lots of IP updates - use flexible arrays for bo list handling - W=1 fixes - Enable seamless boot in more cases - Enable context type property for HDMI - Rework GPUVM TLB flushing - VCN IB start/size alignment fixes amdkfd: - GC 10/11 fixes - GC 11.5 support - use partial migration in GPU faults radeon: - W=1 Fixes - fix some possible buffer overflow/NULL derefs nouveau: - update uapi for NO_PREFETCH - scheduler/fence fixes - rework suspend/resume for GSP-RM - rework display in preparation for GSP-RM habanalabs: - uapi: expose tsc clock - uapi: block access to eventfd through control device - uapi: force dma-buf export to PAGE_SIZE alignments - complete move to accel subsystem - move firmware interface include files - perform hard reset on PCIe AXI drain event - optimise user interrupt handling msm: - DP: use existing helpers for DPCD - DPU: interrupts reworked - gpu: a7xx (a730/a740) support - decouple msm_drv from kms for headless devices mediatek: - MT8188 dsi/dp/edp support - DDP GAMMA - 12 bit LUT support - connector dynamic selection capability rockchip: - rv1126 mipi-dsi/vop support - add planar formats ast: - rename constants panels: - Mitsubishi AA084XE01 - JDI LPM102A188A - LTK050H3148W-CTA6 ivpu: - power management fixes qaic: - add detach slice bo api komeda: - add NV12 writeback tegra: - support NVSYNC/NHSYNC - host1x suspend fixes ili9882t: - separate into own driver" * tag 'drm-next-2023-10-31-1' of git://anongit.freedesktop.org/drm/drm: (1803 commits) drm/amdgpu: Remove unused variables from amdgpu_show_fdinfo drm/amdgpu: Remove duplicate fdinfo fields drm/amd/amdgpu: avoid to disable gfxhub interrupt when driver is unloaded drm/amdgpu: Add EXT_COHERENT support for APU and NUMA systems drm/amdgpu: Retrieve CE count from ce_count_lo_chip in EccInfo table drm/amdgpu: Identify data parity error corrected in replay mode drm/amdgpu: Fix typo in IP discovery parsing drm/amd/display: fix S/G display enablement drm/amdxcp: fix amdxcp unloads incompletely drm/amd/amdgpu: fix the GPU power print error in pm info drm/amdgpu: Use pcie domain of xcc acpi objects drm/amd: check num of link levels when update pcie param drm/amdgpu: Add a read to GFX v9.4.3 ring test drm/amd/pm: call smu_cmn_get_smc_version in is_mode1_reset_supported. drm/amdgpu: get RAS poison status from DF v4_6_2 drm/amdgpu: Use discovery table's subrevision drm/amd/display: 3.2.256 drm/amd/display: add interface to query SubVP status drm/amd/display: Read before writing Backlight Mode Set Register drm/amd/display: Disable SYMCLK32_SE RCO on DCN314 ...
2023-10-23drm/amdkfd: reserve a fence slot while locking the BOChristian König1-1/+1
Looks like the KFD still needs this. Signed-off-by: Christian König <christian.koenig@amd.com> Fixes: 8abc1eb2987a ("drm/amdkfd: switch over to using drm_exec v3") Acked-by: Alex Deucher <alexander.deucher@amd.com> Acked-by: Felix Kuehling <Felix.Kuehling@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231020123306.43978-1-christian.koenig@amd.com
2023-10-13drm/amdgpu: Correctly use bo_va->ref_count in compute VMsXiaogang Chen1-3/+11
This is needed to correctly handle BOs imported into compute VM from gfx. Both kfd and gfx should use same bo_va and set bo_va->ref_count correctly when map the Bos into same VM, otherwise we may trigger kernel general protection when iterate mappings over bo_va's valids or invalids list. Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Xiaogang Chen <Xiaogang.Chen@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Ramesh Errabolu <Ramesh.Errabolu@amd.com> Tested-by: Xiaogang Chen <Xiaogang.Chen@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-04drm/amdgpu: Rework KFD memory max limitsRajneesh Bhardwaj1-2/+8
To allow bigger allocations specially on systems such as GFXIP 9.4.3 that use GTT memory for VRAM allocations, relax the limits to maximize ROCm allocations. Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-26drm/amdkfd: Move dma unmapping after TLB flushPhilip Yang1-4/+22
Otherwise GPU may access the stale mapping and generate IOMMU IO_PAGE_FAULT. Move this to inside p->mutex to prevent multiple threads mapping and unmapping concurrently race condition. After kfd_mem_dmaunmap_attachment is removed from unmap_bo_from_gpuvm, kfd_mem_dmaunmap_attachment is called if failed to map to GPUs, and before free the mem attachment in case failed to unmap from GPUs. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>