- Mar 21, 2025
-
-
Christian König authored
In the critical submission path memory allocations can't wait for reclaim since that can potentially wait for submissions to finish. Finally clean that up and mark most memory allocations in the critical path with GFP_NOWAIT. The only exception left is the dma_fence_array() used when no VMID is available, but that will be cleaned up later on. Signed-off-by:
Christian König <christian.koenig@amd.com> Acked-by:
Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
- Mar 05, 2025
-
-
James Zhu authored
before move to GTT domain. Signed-off-by:
James Zhu <James.Zhu@amd.com> Reviewed-by:
Felix Kuehling <felix.kuehling@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
- Feb 21, 2025
-
-
Christian König authored
Remove all KFD BOs from the private dma_resv object. This prevents the KFD from being evict unecessarily when an exported BO is released. Signed-off-by:
Christian König <christian.koenig@amd.com> Signed-off-by:
James Zhu <James.Zhu@amd.com> Reviewed-by:
Felix Kuehling <felix.kuehling@amd.com> Reviewed-and-tested-by:
James Zhu <James.Zhu@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
- Feb 13, 2025
-
-
Xiaogang Chen authored
Curret kfd does not allocate pasid values, instead uses pasid value for each vm from graphic driver. So should not prevent graphic driver from releasing pasid values since the values are allocated by graphic driver, not kfd driver anymore. This patch does not stop graphic driver release pasid values. Fixes: 8544374c ("drm/amdkfd: Have kfd driver use same PASID values from graphic driver") Signed-off-by:
Xiaogang Chen <xiaogang.chen@amd.com> Reviewed-by:
Felix Kuehling <felix.kuehling@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
Alex Deucher authored
On big and small APUs we send KFD VRAM allocations to GTT since the carve out is either non-existent or relatively small. However, if someone sets the carve out size to be relatively large, we may end up using GTT rather than VRAM. No change of logic with this patch, but it allows the driver to determine which logic to use based on the carve out size in the future. Reviewed-by:
Mario Limonciello <mario.limonciello@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
Xiaogang Chen authored
Current kfd driver has its own PASID value for a kfd process and uses it to locate vm at interrupt handler or mapping between kfd process and vm. That design is not working when a physical gpu device has multiple spatial partitions, ex: adev in CPX mode. This patch has kfd driver use same pasid values that graphic driver generated which is per vm per pasid. These pasid values are passed to fw/hardware. We do not need change interrupt handler though more pasid values are used. Also, pasid values at log are replaced by user process pid; pasid values are not exposed to user. Users see their process pids that have meaning in user space. Signed-off-by:
Xiaogang Chen <xiaogang.chen@amd.com> Reviewed-by:
Felix Kuehling <felix.kuehling@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
- Dec 18, 2024
-
-
Andrew Martin authored
Clean up code to quiet the compiler on us failing to check the return code. Signed-off-by:
Andrew Martin <Andrew.Martin@amd.com> Reviewed-by:
Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
- Oct 22, 2024
-
-
Xiaogang Chen authored
When kfd process has been terminated not restore userptr buffer after mmu notifier invalidates a range. Signed-off-by:
Xiaogang Chen <xiaogang.chen@amd.com> Reviewed-by:
Felix Kuehling <felix.kuehling@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
- Oct 07, 2024
-
-
Lang Yu authored
Only creating a new reference for each process instead of each VM. Fixes: 9a1c1339 ("drm/amdkfd: Run restore_workers on freezable WQs") Suggested-by:
Felix Kuehling <felix.kuehling@amd.com> Signed-off-by:
Lang Yu <lang.yu@amd.com> Reviewed-by:
Felix Kuehling <felix.kuehling@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 5fa43628) Cc: stable@vger.kernel.org
-
Lang Yu authored
Only creating a new reference for each process instead of each VM. Fixes: 9a1c1339 ("drm/amdkfd: Run restore_workers on freezable WQs") Suggested-by:
Felix Kuehling <felix.kuehling@amd.com> Signed-off-by:
Lang Yu <lang.yu@amd.com> Reviewed-by:
Felix Kuehling <felix.kuehling@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
- Sep 18, 2024
-
-
Christian König authored
We haven't used the functionality to pin BOs in a certain range at all while the driver existed. Just nuke it. Signed-off-by:
Christian König <christian.koenig@amd.com> Acked-by:
Lijo Lazar <lijo.lazar@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
- Sep 10, 2024
-
-
Al Viro authored
Using drm_gem_prime_handle_to_fd() to set dmabuf up and insert it into descriptor table, only to have it looked up by file descriptor and remove it from descriptor table is not just too convoluted - it's racy; another thread might have modified the descriptor table while we'd been going through that song and dance. Switch kfd_mem_export_dmabuf() to using drm_gem_prime_handle_to_dmabuf() and leave the descriptor table alone... Reviewed-by:
Felix Kuehling <felix.kuehling@amd.com> Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
- Jul 23, 2024
-
-
PhilipY authored
Add atomic queue_refcount to struct bo_va, return -EBUSY to fail unmap BO from the GPU if the bo_va queue_refcount is not zero. Create queue to increase the bo_va queue_refcount, destroy queue to decrease the bo_va queue_refcount, to ensure the queue buffers mapped on the GPU when queue is active. Signed-off-by:
Philip Yang <Philip.Yang@amd.com> Reviewed-by:
Felix Kuehling <felix.kuehling@amd.com> Acked-by:
Christian König <christian.koenig@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
PhilipY authored
Add helper function kfd_queue_acquire_buffers to get queue wptr_bo reference from queue write_ptr if it is mapped to the KFD node with expected size. Add wptr_bo to structure queue_properties because structure queue is allocated after queue buffers are validated, then we can remove wptr_bo parameter from pqm_create_queue. Rename structure queue wptr_bo_gart to hold wptr_bo reference for GART mapping and umapping. Move MES wptr_bo_gart mapping to init_user_queue, the same location with queue ctx_bo GART mapping. Signed-off-by:
Philip Yang <Philip.Yang@amd.com> Reviewed-by:
Felix Kuehling <felix.kuehling@amd.com> Acked-by:
Christian König <christian.koenig@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
PhilipY authored
Change amdgpu_amdkfd_bo_mapped_to_dev to use drm_priv as parameter instead of adev, to support spatial partition. This is only used by CRIU checkpoint restore now. No functional change. Signed-off-by:
Philip Yang <Philip.Yang@amd.com> Reviewed-by:
Felix Kuehling <felix.kuehling@amd.com> Acked-by:
Christian König <christian.koenig@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
- Jun 05, 2024
-
-
Hawking Zhang authored
Add estimate of how much vram we need to reserve for RAS when caculating the total available vram. v2: apply the change to MP0 v13_0_2 and v13_0_14 Signed-off-by:
Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by:
Tao Zhou <tao.zhou1@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
- May 29, 2024
-
-
Alex Deucher authored
With commit 89773b85 ("drm/amdkfd: Let VRAM allocations go to GTT domain on small APUs") big and small APU "VRAM" handling in KFD was unified. Since AMD_IS_APU is set for both big and small APUs, we can simplify the checks in the code. v2: clean up a few more places (Lang) Acked-by:
Felix Kuehling <felix.kuehling@amd.com> Reviewed-by:
Lang Yu <Lang.Yu@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
Alex Deucher authored
With commit 89773b85 ("drm/amdkfd: Let VRAM allocations go to GTT domain on small APUs") big and small APU "VRAM" handling in KFD was unified. Since AMD_IS_APU is set for both big and small APUs, we can simplify the checks in the code. v2: clean up a few more places (Lang) Acked-by:
Felix Kuehling <felix.kuehling@amd.com> Reviewed-by:
Lang Yu <Lang.Yu@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
- May 20, 2024
-
-
Lang Yu authored
Small APUs(i.e., consumer, embedded products) usually have a small carveout device memory which can't satisfy most compute workloads memory allocation requirements. We can't even run a Basic MNIST Example with a default 512MB carveout. https://github.com/pytorch/examples/tree/main/mnist . Error Log: "torch.cuda.OutOfMemoryError: HIP out of memory. Tried to allocate 84.00 MiB. GPU 0 has a total capacity of 512.00 MiB of which 0 bytes is free. Of the allocated memory 103.83 MiB is allocated by PyTorch, and 22.17 MiB is reserved by PyTorch but unallocated" Though we can change BIOS settings to enlarge carveout size, which is inflexible and may bring complaint. On the other hand, the memory resource can't be effectively used between host and device. The solution is MI300A approach, i.e., let VRAM allocations go to GTT. Then device and host can flexibly and effectively share memory resource. v2: Report local_mem_size_private as 0. (Felix) Signed-off-by:
Lang Yu <Lang.Yu@amd.com> Reviewed-by:
Felix Kuehling <felix.kuehling@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
Lang Yu authored
Observed on gfx8 ASIC where KFD_IOC_ALLOC_MEM_FLAGS_AQL_QUEUE_MEM is used. Two attachments use the same VM, root PD would be locked twice. [ 57.910418] Call Trace: [ 57.793726] ? reserve_bo_and_cond_vms+0x111/0x1c0 [amdgpu] [ 57.793820] amdgpu_amdkfd_gpuvm_unmap_memory_from_gpu+0x6c/0x1c0 [amdgpu] [ 57.793923] ? idr_get_next_ul+0xbe/0x100 [ 57.793933] kfd_process_device_free_bos+0x7e/0xf0 [amdgpu] [ 57.794041] kfd_process_wq_release+0x2ae/0x3c0 [amdgpu] [ 57.794141] ? process_scheduled_works+0x29c/0x580 [ 57.794147] process_scheduled_works+0x303/0x580 [ 57.794157] ? __pfx_worker_thread+0x10/0x10 [ 57.794160] worker_thread+0x1a2/0x370 [ 57.794165] ? __pfx_worker_thread+0x10/0x10 [ 57.794167] kthread+0x11b/0x150 [ 57.794172] ? __pfx_kthread+0x10/0x10 [ 57.794177] ret_from_fork+0x3d/0x60 [ 57.794181] ? __pfx_kthread+0x10/0x10 [ 57.794184] ret_from_fork_asm+0x1b/0x30 Signed-off-by:
Lang Yu <Lang.Yu@amd.com> Reviewed-by:
Felix Kuehling <felix.kuehling@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
- May 17, 2024
-
-
Xiaogang Chen authored
This reverts commit 8a774fe9 ("drm/amdgpu: avoid restore process run into dead loop") since buffer got pinned is not related whether it needs mapping And skip buffer validation at kfd driver if the buffer has been pinned. Signed-off-by:
Xiaogang Chen <Xiaogang.Chen@amd.com> Reviewed-by:
Felix Kuehling <felix.kuehling@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
- May 13, 2024
-
-
PhilipY authored
On system with khugepaged enabled and user cases with THP buffer, the hmm_range_fault may takes > 15 seconds to return -EBUSY, the arbitrary timeout value is not accurate, cause memory allocation failure. Remove the arbitrary timeout value, return EAGAIN to application if hmm_range_fault return EBUSY, then userspace libdrm and Thunk will call ioctl again. Change EAGAIN to debug message as this is not error. Signed-off-by:
Philip Yang <Philip.Yang@amd.com> Reviewed-by:
Felix Kuehling <felix.kuehling@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
- May 08, 2024
-
-
Lang Yu authored
Small APUs(i.e., consumer, embedded products) usually have a small carveout device memory which can't satisfy most compute workloads memory allocation requirements. We can't even run a Basic MNIST Example with a default 512MB carveout. https://github.com/pytorch/examples/tree/main/mnist . Error Log: "torch.cuda.OutOfMemoryError: HIP out of memory. Tried to allocate 84.00 MiB. GPU 0 has a total capacity of 512.00 MiB of which 0 bytes is free. Of the allocated memory 103.83 MiB is allocated by PyTorch, and 22.17 MiB is reserved by PyTorch but unallocated" Though we can change BIOS settings to enlarge carveout size, which is inflexible and may bring complaint. On the other hand, the memory resource can't be effectively used between host and device. The solution is MI300A approach, i.e., let VRAM allocations go to GTT. Then device and host can flexibly and effectively share memory resource. v2: Report local_mem_size_private as 0. (Felix) Signed-off-by:
Lang Yu <Lang.Yu@amd.com> Reviewed-by:
Felix Kuehling <felix.kuehling@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
- May 02, 2024
-
-
Lang Yu authored
Observed on gfx8 ASIC where KFD_IOC_ALLOC_MEM_FLAGS_AQL_QUEUE_MEM is used. Two attachments use the same VM, root PD would be locked twice. [ 57.910418] Call Trace: [ 57.793726] ? reserve_bo_and_cond_vms+0x111/0x1c0 [amdgpu] [ 57.793820] amdgpu_amdkfd_gpuvm_unmap_memory_from_gpu+0x6c/0x1c0 [amdgpu] [ 57.793923] ? idr_get_next_ul+0xbe/0x100 [ 57.793933] kfd_process_device_free_bos+0x7e/0xf0 [amdgpu] [ 57.794041] kfd_process_wq_release+0x2ae/0x3c0 [amdgpu] [ 57.794141] ? process_scheduled_works+0x29c/0x580 [ 57.794147] process_scheduled_works+0x303/0x580 [ 57.794157] ? __pfx_worker_thread+0x10/0x10 [ 57.794160] worker_thread+0x1a2/0x370 [ 57.794165] ? __pfx_worker_thread+0x10/0x10 [ 57.794167] kthread+0x11b/0x150 [ 57.794172] ? __pfx_kthread+0x10/0x10 [ 57.794177] ret_from_fork+0x3d/0x60 [ 57.794181] ? __pfx_kthread+0x10/0x10 [ 57.794184] ret_from_fork_asm+0x1b/0x30 Signed-off-by:
Lang Yu <Lang.Yu@amd.com> Reviewed-by:
Felix Kuehling <felix.kuehling@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
- May 01, 2024
-
-
Mukul Joshi authored
Subtract the VRAM pinned memory when checking for available memory in amdgpu_amdkfd_reserve_mem_limit function since that memory is not available for use. Signed-off-by:
Mukul Joshi <mukul.joshi@amd.com> Reviewed-by:
Felix Kuehling <felix.kuehling@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
- Apr 30, 2024
-
-
PhilipY authored
If the BO pages pinned for RDMA is not contiguous on VRAM, evict it to system memory first to free the VRAM space, then allocate contiguous VRAM space, and then move it from system memory back to VRAM. v6: user context should use interruptible call (Felix) Signed-off-by:
Philip Yang <Philip.Yang@amd.com> Reviewed-by:
Felix Kuehling <felix.kuehling@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
PhilipY authored
RDMA device with limited scatter-gather ability requires contiguous VRAM buffer allocation for RDMA peer direct support. Add a new KFD alloc memory flag and store as bo alloc flag AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS. When pin this bo to export for RDMA peerdirect access, this will set TTM_PL_FLAG_CONTIFUOUS flag, and ask VRAM buddy allocator to get contiguous VRAM. Signed-off-by:
Philip Yang <Philip.Yang@amd.com> Reviewed-by:
Felix Kuehling <felix.kuehling@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
- Apr 26, 2024
-
-
Mukul Joshi authored
Subtract the VRAM pinned memory when checking for available memory in amdgpu_amdkfd_reserve_mem_limit function since that memory is not available for use. Signed-off-by:
Mukul Joshi <mukul.joshi@amd.com> Reviewed-by:
Felix Kuehling <felix.kuehling@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
- Apr 24, 2024
-
-
Lang Yu authored
When page table BOs were evicted but not validated before updating page tables, VM is still in evicting state, amdgpu_vm_update_range returns -EBUSY and restore_process_worker runs into a dead loop. v2: Split the BO validation and page table update into two separate loops in amdgpu_amdkfd_restore_process_bos. (Felix) 1.Validate BOs 2.Validate VM (and DMABuf attachments) 3.Update page tables for the BOs validated above Fixes: 50661eb1 ("drm/amdgpu: Auto-validate DMABuf imports in compute VMs") Signed-off-by:
Lang Yu <Lang.Yu@amd.com> Acked-by:
Christian König <christian.koenig@amd.com> Reviewed-by:
Felix Kuehling <felix.kuehling@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
-
Mukul Joshi authored
Free the sync object if the memory allocation fails for any reason. Signed-off-by:
Mukul Joshi <mukul.joshi@amd.com> Reviewed-by:
Alex Deucher <alexander.deucher@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
-
- Apr 19, 2024
-
-
Lang Yu authored
When page table BOs were evicted but not validated before updating page tables, VM is still in evicting state, amdgpu_vm_update_range returns -EBUSY and restore_process_worker runs into a dead loop. v2: Split the BO validation and page table update into two separate loops in amdgpu_amdkfd_restore_process_bos. (Felix) 1.Validate BOs 2.Validate VM (and DMABuf attachments) 3.Update page tables for the BOs validated above Fixes: 50661eb1 ("drm/amdgpu: Auto-validate DMABuf imports in compute VMs") Signed-off-by:
Lang Yu <Lang.Yu@amd.com> Acked-by:
Christian König <christian.koenig@amd.com> Reviewed-by:
Felix Kuehling <felix.kuehling@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
Mukul Joshi authored
Free the sync object if the memory allocation fails for any reason. Signed-off-by:
Mukul Joshi <mukul.joshi@amd.com> Reviewed-by:
Alex Deucher <alexander.deucher@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
- Mar 20, 2024
-
-
Mukul Joshi authored
In certain situations, some apps can import a BO multiple times (through IPC for example). To restore such processes successfully, we need to tell drm to ignore duplicate BOs. While at it, also add additional logging to prevent silent failures when process restore fails. Signed-off-by:
Mukul Joshi <mukul.joshi@amd.com> Reviewed-by:
Felix Kuehling <felix.kuehling@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
- Feb 28, 2024
-
-
Eric Huang authored
The adev can be found from bo by amdgpu_ttm_adev(bo->tbo.bdev), and adev is also not used in the function amdgpu_amdkfd_map_gtt_bo_to_gart(). Signed-off-by:
Eric Huang <jinhuieric.huang@amd.com> Reviewed-by:
Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
- Jan 31, 2024
-
-
Lang Yu authored
Fix a warning. v2: Avoid unmapping attachment repeatedly when ERESTARTSYS. v3: Lock the BO before accessing ttm->sg to avoid race conditions.(Felix) [ 41.708711] WARNING: CPU: 0 PID: 1463 at drivers/gpu/drm/ttm/ttm_bo.c:846 ttm_bo_validate+0x146/0x1b0 [ttm] [ 41.708989] Call Trace: [ 41.708992] <TASK> [ 41.708996] ? show_regs+0x6c/0x80 [ 41.709000] ? ttm_bo_validate+0x146/0x1b0 [ttm] [ 41.709008] ? __warn+0x93/0x190 [ 41.709014] ? ttm_bo_validate+0x146/0x1b0 [ttm] [ 41.709024] ? report_bug+0x1f9/0x210 [ 41.709035] ? handle_bug+0x46/0x80 [ 41.709041] ? exc_invalid_op+0x1d/0x80 [ 41.709048] ? asm_exc_invalid_op+0x1f/0x30 [ 41.709057] ? amdgpu_amdkfd_gpuvm_dmaunmap_mem+0x2c/0x80 [amdgpu] [ 41.709185] ? ttm_bo_validate+0x146/0x1b0 [ttm] [ 41.709197] ? amdgpu_amdkfd_gpuvm_dmaunmap_mem+0x2c/0x80 [amdgpu] [ 41.709337] ? srso_alias_return_thunk+0x5/0x7f [ 41.709346] kfd_mem_dmaunmap_attachment+0x9e/0x1e0 [amdgpu] [ 41.709467] amdgpu_amdkfd_gpuvm_dmaunmap_mem+0x56/0x80 [amdgpu] [ 41.709586] kfd_ioctl_unmap_memory_from_gpu+0x1b7/0x300 [amdgpu] [ 41.709710] kfd_ioctl+0x1ec/0x650 [amdgpu] [ 41.709822] ? __pfx_kfd_ioctl_unmap_memory_from_gpu+0x10/0x10 [amdgpu] [ 41.709945] ? srso_alias_return_thunk+0x5/0x7f [ 41.709949] ? tomoyo_file_ioctl+0x20/0x30 [ 41.709959] __x64_sys_ioctl+0x9c/0xd0 [ 41.709967] do_syscall_64+0x3f/0x90 [ 41.709973] entry_SYSCALL_64_after_hwframe+0x6e/0xd8 Fixes: 101b8104 ("drm/amdkfd: Move dma unmapping after TLB flush") Signed-off-by:
Lang Yu <Lang.Yu@amd.com> Reviewed-by:
Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
Lang Yu authored
Fix a warning. v2: Avoid unmapping attachment repeatedly when ERESTARTSYS. v3: Lock the BO before accessing ttm->sg to avoid race conditions.(Felix) [ 41.708711] WARNING: CPU: 0 PID: 1463 at drivers/gpu/drm/ttm/ttm_bo.c:846 ttm_bo_validate+0x146/0x1b0 [ttm] [ 41.708989] Call Trace: [ 41.708992] <TASK> [ 41.708996] ? show_regs+0x6c/0x80 [ 41.709000] ? ttm_bo_validate+0x146/0x1b0 [ttm] [ 41.709008] ? __warn+0x93/0x190 [ 41.709014] ? ttm_bo_validate+0x146/0x1b0 [ttm] [ 41.709024] ? report_bug+0x1f9/0x210 [ 41.709035] ? handle_bug+0x46/0x80 [ 41.709041] ? exc_invalid_op+0x1d/0x80 [ 41.709048] ? asm_exc_invalid_op+0x1f/0x30 [ 41.709057] ? amdgpu_amdkfd_gpuvm_dmaunmap_mem+0x2c/0x80 [amdgpu] [ 41.709185] ? ttm_bo_validate+0x146/0x1b0 [ttm] [ 41.709197] ? amdgpu_amdkfd_gpuvm_dmaunmap_mem+0x2c/0x80 [amdgpu] [ 41.709337] ? srso_alias_return_thunk+0x5/0x7f [ 41.709346] kfd_mem_dmaunmap_attachment+0x9e/0x1e0 [amdgpu] [ 41.709467] amdgpu_amdkfd_gpuvm_dmaunmap_mem+0x56/0x80 [amdgpu] [ 41.709586] kfd_ioctl_unmap_memory_from_gpu+0x1b7/0x300 [amdgpu] [ 41.709710] kfd_ioctl+0x1ec/0x650 [amdgpu] [ 41.709822] ? __pfx_kfd_ioctl_unmap_memory_from_gpu+0x10/0x10 [amdgpu] [ 41.709945] ? srso_alias_return_thunk+0x5/0x7f [ 41.709949] ? tomoyo_file_ioctl+0x20/0x30 [ 41.709959] __x64_sys_ioctl+0x9c/0xd0 [ 41.709967] do_syscall_64+0x3f/0x90 [ 41.709973] entry_SYSCALL_64_after_hwframe+0x6e/0xd8 Fixes: 101b8104 ("drm/amdkfd: Move dma unmapping after TLB flush") Signed-off-by:
Lang Yu <Lang.Yu@amd.com> Reviewed-by:
Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
- Jan 15, 2024
-
-
Felix Kuehling authored
DMABuf imports in compute VMs are not wrapped in a kgd_mem object on the process_info->kfd_bo_list. There is no explicit KFD API call to validate them or add eviction fences to them. This patch automatically validates and fences dymanic DMABuf imports when they are added to a compute VM. Revalidation after evictions is handled in the VM code. v2: * Renamed amdgpu_vm_validate_evicted_bos to amdgpu_vm_validate * Eliminated evicted_user state, use evicted state for VM BOs and user BOs * Fixed and simplified amdgpu_vm_fence_imports, depends on reserved BOs * Moved dma_resv_reserve_fences for amdgpu_vm_fence_imports into amdgpu_vm_validate, outside the vm->status_lock * Added dummy version of amdgpu_amdkfd_bo_validate_and_fence for builds without KFD v4: Eliminate amdgpu_vm_fence_imports. It's not needed because the reservation with its fences is shared with the export, as long as all imports are from KFD, with the exports already reserved, validated and fenced by the KFD restore worker. v5: Reintroduced separate evicted_user state to simplify the state machine and CS error handling when amdgpu_vm_validate is called without a ticket. Signed-off-by:
Felix Kuehling <felix.kuehling@amd.com> Reviewed-by:
Christian König <christian.koenig@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
- Jan 09, 2024
-
-
Felix Kuehling authored
Properly mark kfd_process->ef as __rcu and consistently use the right accessor functions. Reported-by:
kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202312052245.yFpBSgNH-lkp@intel.com/ Signed-off-by:
Felix Kuehling <felix.kuehling@amd.com> Reviewed-by:
Christian König <christian.koenig@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
- Dec 13, 2023
-
-
Felix Kuehling authored
Use drm_gem_prime_fd_to_handle to import DMABufs for interop. This ensures that a GEM handle is created on import and that obj->dma_buf will be set and remain set as long as the object is imported into KFD. Signed-off-by:
Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by:
Ramesh Errabolu <Ramesh.Errabolu@amd.com> Reviewed-by:
Xiaogang.Chen <Xiaogang.Chen@amd.com> Acked-by:
Christian König <christian.koenig@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
Felix Kuehling authored
Create GEM handles for exporting DMABufs using GEM-Prime APIs. The GEM handles are created in a drm_client_dev context to avoid exposing them in user mode contexts through a DMABuf import. Signed-off-by:
Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by:
Ramesh Errabolu <Ramesh.Errabolu@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-