Skip to content
Snippets Groups Projects
  1. Mar 21, 2025
  2. Mar 05, 2025
  3. Feb 21, 2025
  4. Feb 13, 2025
  5. Dec 18, 2024
  6. Oct 22, 2024
  7. Oct 07, 2024
  8. Sep 18, 2024
  9. Sep 10, 2024
  10. Jul 23, 2024
  11. Jun 05, 2024
  12. May 29, 2024
  13. May 20, 2024
    • Lang Yu's avatar
      drm/amdkfd: Let VRAM allocations go to GTT domain on small APUs · eb853413
      Lang Yu authored
      Small APUs(i.e., consumer, embedded products) usually have a small
      carveout device memory which can't satisfy most compute workloads
      memory allocation requirements.
      
      We can't even run a Basic MNIST Example with a default 512MB carveout.
      https://github.com/pytorch/examples/tree/main/mnist
      
      . Error Log:
      
      "torch.cuda.OutOfMemoryError: HIP out of memory. Tried to allocate
      84.00 MiB. GPU 0 has a total capacity of 512.00 MiB of which 0 bytes
      is free. Of the allocated memory 103.83 MiB is allocated by PyTorch,
      and 22.17 MiB is reserved by PyTorch but unallocated"
      
      Though we can change BIOS settings to enlarge carveout size,
      which is inflexible and may bring complaint. On the other hand,
      the memory resource can't be effectively used between host and device.
      
      The solution is MI300A approach, i.e., let VRAM allocations go to GTT.
      Then device and host can flexibly and effectively share memory resource.
      
      v2: Report local_mem_size_private as 0. (Felix)
      
      Signed-off-by: default avatarLang Yu <Lang.Yu@amd.com>
      Reviewed-by: default avatarFelix Kuehling <felix.kuehling@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      eb853413
    • Lang Yu's avatar
      drm/amdkfd: handle duplicate BOs in reserve_bo_and_cond_vms · 2a705f3e
      Lang Yu authored
      
      Observed on gfx8 ASIC where KFD_IOC_ALLOC_MEM_FLAGS_AQL_QUEUE_MEM is used.
      Two attachments use the same VM, root PD would be locked twice.
      
      [   57.910418] Call Trace:
      [   57.793726]  ? reserve_bo_and_cond_vms+0x111/0x1c0 [amdgpu]
      [   57.793820]  amdgpu_amdkfd_gpuvm_unmap_memory_from_gpu+0x6c/0x1c0 [amdgpu]
      [   57.793923]  ? idr_get_next_ul+0xbe/0x100
      [   57.793933]  kfd_process_device_free_bos+0x7e/0xf0 [amdgpu]
      [   57.794041]  kfd_process_wq_release+0x2ae/0x3c0 [amdgpu]
      [   57.794141]  ? process_scheduled_works+0x29c/0x580
      [   57.794147]  process_scheduled_works+0x303/0x580
      [   57.794157]  ? __pfx_worker_thread+0x10/0x10
      [   57.794160]  worker_thread+0x1a2/0x370
      [   57.794165]  ? __pfx_worker_thread+0x10/0x10
      [   57.794167]  kthread+0x11b/0x150
      [   57.794172]  ? __pfx_kthread+0x10/0x10
      [   57.794177]  ret_from_fork+0x3d/0x60
      [   57.794181]  ? __pfx_kthread+0x10/0x10
      [   57.794184]  ret_from_fork_asm+0x1b/0x30
      
      Signed-off-by: default avatarLang Yu <Lang.Yu@amd.com>
      Reviewed-by: default avatarFelix Kuehling <felix.kuehling@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      2a705f3e
  14. May 17, 2024
  15. May 13, 2024
  16. May 08, 2024
    • Lang Yu's avatar
      drm/amdkfd: Let VRAM allocations go to GTT domain on small APUs · 89773b85
      Lang Yu authored
      Small APUs(i.e., consumer, embedded products) usually have a small
      carveout device memory which can't satisfy most compute workloads
      memory allocation requirements.
      
      We can't even run a Basic MNIST Example with a default 512MB carveout.
      https://github.com/pytorch/examples/tree/main/mnist
      
      . Error Log:
      
      "torch.cuda.OutOfMemoryError: HIP out of memory. Tried to allocate
      84.00 MiB. GPU 0 has a total capacity of 512.00 MiB of which 0 bytes
      is free. Of the allocated memory 103.83 MiB is allocated by PyTorch,
      and 22.17 MiB is reserved by PyTorch but unallocated"
      
      Though we can change BIOS settings to enlarge carveout size,
      which is inflexible and may bring complaint. On the other hand,
      the memory resource can't be effectively used between host and device.
      
      The solution is MI300A approach, i.e., let VRAM allocations go to GTT.
      Then device and host can flexibly and effectively share memory resource.
      
      v2: Report local_mem_size_private as 0. (Felix)
      
      Signed-off-by: default avatarLang Yu <Lang.Yu@amd.com>
      Reviewed-by: default avatarFelix Kuehling <felix.kuehling@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      89773b85
  17. May 02, 2024
    • Lang Yu's avatar
      drm/amdkfd: handle duplicate BOs in reserve_bo_and_cond_vms · 2d6f49ee
      Lang Yu authored
      
      Observed on gfx8 ASIC where KFD_IOC_ALLOC_MEM_FLAGS_AQL_QUEUE_MEM is used.
      Two attachments use the same VM, root PD would be locked twice.
      
      [   57.910418] Call Trace:
      [   57.793726]  ? reserve_bo_and_cond_vms+0x111/0x1c0 [amdgpu]
      [   57.793820]  amdgpu_amdkfd_gpuvm_unmap_memory_from_gpu+0x6c/0x1c0 [amdgpu]
      [   57.793923]  ? idr_get_next_ul+0xbe/0x100
      [   57.793933]  kfd_process_device_free_bos+0x7e/0xf0 [amdgpu]
      [   57.794041]  kfd_process_wq_release+0x2ae/0x3c0 [amdgpu]
      [   57.794141]  ? process_scheduled_works+0x29c/0x580
      [   57.794147]  process_scheduled_works+0x303/0x580
      [   57.794157]  ? __pfx_worker_thread+0x10/0x10
      [   57.794160]  worker_thread+0x1a2/0x370
      [   57.794165]  ? __pfx_worker_thread+0x10/0x10
      [   57.794167]  kthread+0x11b/0x150
      [   57.794172]  ? __pfx_kthread+0x10/0x10
      [   57.794177]  ret_from_fork+0x3d/0x60
      [   57.794181]  ? __pfx_kthread+0x10/0x10
      [   57.794184]  ret_from_fork_asm+0x1b/0x30
      
      Signed-off-by: default avatarLang Yu <Lang.Yu@amd.com>
      Reviewed-by: default avatarFelix Kuehling <felix.kuehling@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      2d6f49ee
  18. May 01, 2024
  19. Apr 30, 2024
  20. Apr 26, 2024
  21. Apr 24, 2024
  22. Apr 19, 2024
  23. Mar 20, 2024
  24. Feb 28, 2024
  25. Jan 31, 2024
    • Lang Yu's avatar
      drm/amdkfd: reserve the BO before validating it · 9c29282e
      Lang Yu authored
      
      Fix a warning.
      
      v2: Avoid unmapping attachment repeatedly when ERESTARTSYS.
      
      v3: Lock the BO before accessing ttm->sg to avoid race conditions.(Felix)
      
      [   41.708711] WARNING: CPU: 0 PID: 1463 at drivers/gpu/drm/ttm/ttm_bo.c:846 ttm_bo_validate+0x146/0x1b0 [ttm]
      [   41.708989] Call Trace:
      [   41.708992]  <TASK>
      [   41.708996]  ? show_regs+0x6c/0x80
      [   41.709000]  ? ttm_bo_validate+0x146/0x1b0 [ttm]
      [   41.709008]  ? __warn+0x93/0x190
      [   41.709014]  ? ttm_bo_validate+0x146/0x1b0 [ttm]
      [   41.709024]  ? report_bug+0x1f9/0x210
      [   41.709035]  ? handle_bug+0x46/0x80
      [   41.709041]  ? exc_invalid_op+0x1d/0x80
      [   41.709048]  ? asm_exc_invalid_op+0x1f/0x30
      [   41.709057]  ? amdgpu_amdkfd_gpuvm_dmaunmap_mem+0x2c/0x80 [amdgpu]
      [   41.709185]  ? ttm_bo_validate+0x146/0x1b0 [ttm]
      [   41.709197]  ? amdgpu_amdkfd_gpuvm_dmaunmap_mem+0x2c/0x80 [amdgpu]
      [   41.709337]  ? srso_alias_return_thunk+0x5/0x7f
      [   41.709346]  kfd_mem_dmaunmap_attachment+0x9e/0x1e0 [amdgpu]
      [   41.709467]  amdgpu_amdkfd_gpuvm_dmaunmap_mem+0x56/0x80 [amdgpu]
      [   41.709586]  kfd_ioctl_unmap_memory_from_gpu+0x1b7/0x300 [amdgpu]
      [   41.709710]  kfd_ioctl+0x1ec/0x650 [amdgpu]
      [   41.709822]  ? __pfx_kfd_ioctl_unmap_memory_from_gpu+0x10/0x10 [amdgpu]
      [   41.709945]  ? srso_alias_return_thunk+0x5/0x7f
      [   41.709949]  ? tomoyo_file_ioctl+0x20/0x30
      [   41.709959]  __x64_sys_ioctl+0x9c/0xd0
      [   41.709967]  do_syscall_64+0x3f/0x90
      [   41.709973]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
      
      Fixes: 101b8104 ("drm/amdkfd: Move dma unmapping after TLB flush")
      Signed-off-by: default avatarLang Yu <Lang.Yu@amd.com>
      Reviewed-by: default avatarFelix Kuehling <Felix.Kuehling@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      9c29282e
    • Lang Yu's avatar
      drm/amdkfd: reserve the BO before validating it · 0c93bd49
      Lang Yu authored
      
      Fix a warning.
      
      v2: Avoid unmapping attachment repeatedly when ERESTARTSYS.
      
      v3: Lock the BO before accessing ttm->sg to avoid race conditions.(Felix)
      
      [   41.708711] WARNING: CPU: 0 PID: 1463 at drivers/gpu/drm/ttm/ttm_bo.c:846 ttm_bo_validate+0x146/0x1b0 [ttm]
      [   41.708989] Call Trace:
      [   41.708992]  <TASK>
      [   41.708996]  ? show_regs+0x6c/0x80
      [   41.709000]  ? ttm_bo_validate+0x146/0x1b0 [ttm]
      [   41.709008]  ? __warn+0x93/0x190
      [   41.709014]  ? ttm_bo_validate+0x146/0x1b0 [ttm]
      [   41.709024]  ? report_bug+0x1f9/0x210
      [   41.709035]  ? handle_bug+0x46/0x80
      [   41.709041]  ? exc_invalid_op+0x1d/0x80
      [   41.709048]  ? asm_exc_invalid_op+0x1f/0x30
      [   41.709057]  ? amdgpu_amdkfd_gpuvm_dmaunmap_mem+0x2c/0x80 [amdgpu]
      [   41.709185]  ? ttm_bo_validate+0x146/0x1b0 [ttm]
      [   41.709197]  ? amdgpu_amdkfd_gpuvm_dmaunmap_mem+0x2c/0x80 [amdgpu]
      [   41.709337]  ? srso_alias_return_thunk+0x5/0x7f
      [   41.709346]  kfd_mem_dmaunmap_attachment+0x9e/0x1e0 [amdgpu]
      [   41.709467]  amdgpu_amdkfd_gpuvm_dmaunmap_mem+0x56/0x80 [amdgpu]
      [   41.709586]  kfd_ioctl_unmap_memory_from_gpu+0x1b7/0x300 [amdgpu]
      [   41.709710]  kfd_ioctl+0x1ec/0x650 [amdgpu]
      [   41.709822]  ? __pfx_kfd_ioctl_unmap_memory_from_gpu+0x10/0x10 [amdgpu]
      [   41.709945]  ? srso_alias_return_thunk+0x5/0x7f
      [   41.709949]  ? tomoyo_file_ioctl+0x20/0x30
      [   41.709959]  __x64_sys_ioctl+0x9c/0xd0
      [   41.709967]  do_syscall_64+0x3f/0x90
      [   41.709973]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
      
      Fixes: 101b8104 ("drm/amdkfd: Move dma unmapping after TLB flush")
      Signed-off-by: default avatarLang Yu <Lang.Yu@amd.com>
      Reviewed-by: default avatarFelix Kuehling <Felix.Kuehling@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      0c93bd49
  26. Jan 15, 2024
    • Felix Kuehling's avatar
      drm/amdgpu: Auto-validate DMABuf imports in compute VMs · 50661eb1
      Felix Kuehling authored
      
      DMABuf imports in compute VMs are not wrapped in a kgd_mem object on the
      process_info->kfd_bo_list. There is no explicit KFD API call to validate
      them or add eviction fences to them.
      
      This patch automatically validates and fences dymanic DMABuf imports when
      they are added to a compute VM. Revalidation after evictions is handled
      in the VM code.
      
      v2:
      * Renamed amdgpu_vm_validate_evicted_bos to amdgpu_vm_validate
      * Eliminated evicted_user state, use evicted state for VM BOs and user BOs
      * Fixed and simplified amdgpu_vm_fence_imports, depends on reserved BOs
      * Moved dma_resv_reserve_fences for amdgpu_vm_fence_imports into
        amdgpu_vm_validate, outside the vm->status_lock
      * Added dummy version of amdgpu_amdkfd_bo_validate_and_fence for builds
        without KFD
      
      v4: Eliminate amdgpu_vm_fence_imports. It's not needed because the
      reservation with its fences is shared with the export, as long as all
      imports are from KFD, with the exports already reserved, validated and
      fenced by the KFD restore worker.
      
      v5: Reintroduced separate evicted_user state to simplify the state machine
      and CS error handling when amdgpu_vm_validate is called without a ticket.
      
      Signed-off-by: default avatarFelix Kuehling <felix.kuehling@amd.com>
      Reviewed-by: default avatarChristian König <christian.koenig@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      50661eb1
  27. Jan 09, 2024
  28. Dec 13, 2023
Loading