- Jun 13, 2023
-
-
arunpravin24 authored
This reverts commit c1055186. This patch disables the TOPDOWN flag for APU and few dGPU cards which has the VRAM size equal to the BAR size. When we enable the TOPDOWN flag, we get the free blocks at the highest available memory region and we don't split the lower order blocks. This change is required to keep off the fragmentation related issues particularly in ASIC which has VRAM space <= 500MiB Hence, we are reverting this patch. Link: drm/amd#2270 Signed-off-by:
Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Reviewed-by:
Christian König <christian.koenig@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
-
- Jun 07, 2023
-
-
Horatio Zhang authored
Use the function of amdgpu_bo_vm_destroy to handle the resource release of shadow bo. During the amdgpu_mes_self_test, shadow bo released, but vmbo->shadow_list was not, which caused a null pointer reference error in amdgpu_device_recover_vram when GPU reset. Fixes: 6c032c37 ("drm/amdgpu: Fix vram recover doesn't work after whole GPU reset (v2)") Signed-off-by:
xinhui pan <xinhui.pan@amd.com> Signed-off-by:
Horatio Zhang <Hongkun.Zhang@amd.com> Acked-by:
Feifei Xu <Feifei.Xu@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
- Mar 13, 2023
-
-
Marek Olšák authored
This will be used for performance investigations. Signed-off-by:
Marek Olšák <marek.olsak@amd.com> Reviewed-by:
Christian König <christian.koenig@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
- Mar 10, 2023
-
-
lyndonli authored
On GPUs with RAS enabled, below call trace and hang are observed when shutting down device. v2: use DRM device unplugged flag instead of shutdown flag as the check to prevent memory wipe in shutdown stage. [ +0.000000] RIP: 0010:amdgpu_vram_mgr_fini+0x18d/0x1c0 [amdgpu] [ +0.000001] PKRU: 55555554 [ +0.000001] Call Trace: [ +0.000001] <TASK> [ +0.000002] amdgpu_ttm_fini+0x140/0x1c0 [amdgpu] [ +0.000183] amdgpu_bo_fini+0x27/0xa0 [amdgpu] [ +0.000184] gmc_v11_0_sw_fini+0x2b/0x40 [amdgpu] [ +0.000163] amdgpu_device_fini_sw+0xb6/0x510 [amdgpu] [ +0.000152] amdgpu_driver_release_kms+0x16/0x30 [amdgpu] [ +0.000090] drm_dev_release+0x28/0x50 [drm] [ +0.000016] devm_drm_dev_init_release+0x38/0x60 [drm] [ +0.000011] devm_action_release+0x15/0x20 [ +0.000003] release_nodes+0x40/0xc0 [ +0.000001] devres_release_all+0x9e/0xe0 [ +0.000001] device_unbind_cleanup+0x12/0x80 [ +0.000003] device_release_driver_internal+0xff/0x160 [ +0.000001] driver_detach+0x4a/0x90 [ +0.000001] bus_remove_driver+0x6c/0xf0 [ +0.000001] driver_unregister+0x31/0x50 [ +0.000001] pci_unregister_driver+0x40/0x90 [ +0.000003] amdgpu_exit+0x15/0x120 [amdgpu] Signed-off-by:
lyndonli <Lyndon.Li@amd.com> Reviewed-by:
Guchun Chen <guchun.chen@amd.com> Reviewed-by:
Christian König <christian.koenig@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
- Mar 07, 2023
-
-
lyndonli authored
On GPUs with RAS enabled, below call trace and hang are observed when shutting down device. v2: use DRM device unplugged flag instead of shutdown flag as the check to prevent memory wipe in shutdown stage. [ +0.000000] RIP: 0010:amdgpu_vram_mgr_fini+0x18d/0x1c0 [amdgpu] [ +0.000001] PKRU: 55555554 [ +0.000001] Call Trace: [ +0.000001] <TASK> [ +0.000002] amdgpu_ttm_fini+0x140/0x1c0 [amdgpu] [ +0.000183] amdgpu_bo_fini+0x27/0xa0 [amdgpu] [ +0.000184] gmc_v11_0_sw_fini+0x2b/0x40 [amdgpu] [ +0.000163] amdgpu_device_fini_sw+0xb6/0x510 [amdgpu] [ +0.000152] amdgpu_driver_release_kms+0x16/0x30 [amdgpu] [ +0.000090] drm_dev_release+0x28/0x50 [drm] [ +0.000016] devm_drm_dev_init_release+0x38/0x60 [drm] [ +0.000011] devm_action_release+0x15/0x20 [ +0.000003] release_nodes+0x40/0xc0 [ +0.000001] devres_release_all+0x9e/0xe0 [ +0.000001] device_unbind_cleanup+0x12/0x80 [ +0.000003] device_release_driver_internal+0xff/0x160 [ +0.000001] driver_detach+0x4a/0x90 [ +0.000001] bus_remove_driver+0x6c/0xf0 [ +0.000001] driver_unregister+0x31/0x50 [ +0.000001] pci_unregister_driver+0x40/0x90 [ +0.000003] amdgpu_exit+0x15/0x120 [amdgpu] Signed-off-by:
lyndonli <Lyndon.Li@amd.com> Reviewed-by:
Guchun Chen <guchun.chen@amd.com> Reviewed-by:
Christian König <christian.koenig@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
- Feb 23, 2023
-
-
Shane Xiao authored
Since VRAM manager is changed from drm mm to drm buddy, the TOP_DOWN flag should not be set by default in the large bar system. Removing this flag helps improve drm buddy allocator efficiency and reduce the risk of splitting higher order block into lower order. Signed-off-by:
Shane Xiao <shane.xiao@amd.com> Reviewed-by:
Christian K�nig <christian.koenig@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
- Feb 09, 2023
-
-
Use amdgpu_bo_in_cpu_visible_vram() instead. v2 (chk): fix test inversion Signed-off-by:
Somalapuram Amaranath <Amaranath.Somalapuram@amd.com> Reviewed-by:
Christian König <christian.koenig@amd.com> Signed-off-by:
Christian König <christian.koenig@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230208090106.9659-2-Amaranath.Somalapuram@amd.com
-
- Jan 19, 2023
-
-
Pierre-Eric Pelloux-Prayer authored
This allows to correlate the infos printed by /sys/kernel/debug/dri/n/amdgpu_gem_info to the ones found in /proc/.../fdinfo and /sys/kernel/debug/dma_buf/bufinfo. Signed-off-by:
Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Reviewed-by:
Christian König <christian.koenig@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
- Jan 09, 2023
-
-
Luben Tuikov authored
Fix potential NULL dereference, in the case when "man", the resource manager might be NULL, when/if we print debug information. Cc: Alex Deucher <Alexander.Deucher@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: AMD Graphics <amd-gfx@lists.freedesktop.org> Cc: Dan Carpenter <error27@gmail.com> Cc: kernel test robot <lkp@intel.com> Fixes: 7554886d ("drm/amdgpu: Fix size validation for non-exclusive domains (v4)") Signed-off-by:
Luben Tuikov <luben.tuikov@amd.com> Reviewed-by:
Christian König <christian.koenig@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
Luben Tuikov authored
Fix potential NULL dereference, in the case when "man", the resource manager might be NULL, when/if we print debug information. Cc: Alex Deucher <Alexander.Deucher@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: AMD Graphics <amd-gfx@lists.freedesktop.org> Cc: Dan Carpenter <error27@gmail.com> Cc: kernel test robot <lkp@intel.com> Fixes: 7554886d ("drm/amdgpu: Fix size validation for non-exclusive domains (v4)") Signed-off-by:
Luben Tuikov <luben.tuikov@amd.com> Reviewed-by:
Christian König <christian.koenig@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
- Dec 14, 2022
-
-
Christian König authored
This reverts commit f9d00a4a. This causes problem for KFD because when we overcommit we accidentially bind the BO to GTT for moving it into VRAM. We also need to make sure that this is done only as fallback after trying to evict first. Signed-off-by:
Christian König <christian.koenig@amd.com> Reviewed-by:
Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
Luben Tuikov authored
Remove the "domain" argument to amdgpu_bo_create_kernel_at() since this function takes an "offset" argument which is the offset off of VRAM, and as such allocation always takes place in VRAM. Thus, the "domain" argument is unnecessary. Cc: Alex Deucher <Alexander.Deucher@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: AMD Graphics <amd-gfx@lists.freedesktop.org> Signed-off-by:
Luben Tuikov <luben.tuikov@amd.com> Reviewed-by:
Christian König <christian.koenig@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
Luben Tuikov authored
Fix amdgpu_bo_validate_size() to check whether the TTM domain manager for the requested memory exists, else we get a kernel oops when dereferencing "man". v2: Make the patch standalone, i.e. not dependent on local patches. v3: Preserve old behaviour and just check that the manager pointer is not NULL. v4: Complain if GTT domain requested and it is uninitialized--most likely a bug. Cc: Alex Deucher <Alexander.Deucher@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: AMD Graphics <amd-gfx@lists.freedesktop.org> Signed-off-by:
Luben Tuikov <luben.tuikov@amd.com> Reviewed-by:
Christian König <christian.koenig@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
Christian König authored
When buffers are freed during suspend there is no guarantee that they can be re-allocated during resume. The PSP subsystem seems to be quite buggy regarding this, so add a WARN_ON() to point out those bugs. Signed-off-by:
Christian König <christian.koenig@amd.com> Reviewed-by:
Alex Deucher <alexdeucher@amd.com> Tested-by:
Guilherme G. Piccoli <gpiccoli@igalia.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
- Dec 09, 2022
-
-
Alex Deucher authored
Only apply the static threshold for Stoney and Carrizo. This hardware has certain requirements that don't allow mixing of GTT and VRAM. Newer asics do not have these requirements so we should be able to be more flexible with where buffers end up. Bug: drm/amd#2270 Bug: drm/amd#2291 Bug: drm/amd#2255 Acked-by:
Luben Tuikov <luben.tuikov@amd.com> Reviewed-by:
Christian König <christian.koenig@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
-
- Dec 06, 2022
-
-
Christian König authored
We already fallback to a dummy BO with no backing store when we allocate GDS,GWS and OA resources and to GTT when we allocate VRAM. Drop all those workarounds and generalize this for GTT as well. This fixes ENOMEM issues with runaway applications which try to allocate/free GTT in a loop and are otherwise only limited by the CPU speed. The CS will wait for the cleanup of freed up BOs to satisfy the various domain specific limits and so effectively throttle those buggy applications down to a sane allocation behavior again. Signed-off-by:
Christian König <christian.koenig@amd.com> Reviewed-by:
Alex Deucher <alexander.deucher@amd.com> Reviewed-by:
Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
- Oct 27, 2022
-
-
Change ttm_resource structure from num_pages to size_t size in bytes. v1 -> v2: change PFN_UP(dst_mem->size) to ttm->num_pages v1 -> v2: change bo->resource->size to bo->base.size at some places v1 -> v2: remove the local variable v1 -> v2: cleanup cmp_size_smaller_first() v2 -> v3: adding missing PFN_UP in ttm_bo_vm_fault_reserved Signed-off-by:
Somalapuram Amaranath <Amaranath.Somalapuram@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221027091237.983582-1-Amaranath.Somalapuram@amd.com Reviewed-by:
Christian König <christian.koenig@amd.com> Signed-off-by:
Christian König <christian.koenig@amd.com>
-
- Oct 06, 2022
-
-
PhilipY authored
Under VRAM usage pression, map to GPU may fail to create pt bo and vmbo->shadow_list is not initialized, then ttm_bo_release calling amdgpu_bo_vm_destroy to access vmbo->shadow_list generates below dmesg and NULL pointer access backtrace: Set vmbo destroy callback to amdgpu_bo_vm_destroy only after creating pt bo successfully, otherwise use default callback amdgpu_bo_destroy. amdgpu: amdgpu_vm_bo_update failed amdgpu: update_gpuvm_pte() failed amdgpu: Failed to map bo to gpuvm amdgpu 0000:43:00.0: amdgpu: Failed to map peer:0000:43:00.0 mem_domain:2 BUG: kernel NULL pointer dereference, address: RIP: 0010:amdgpu_bo_vm_destroy+0x4d/0x80 [amdgpu] Call Trace: <TASK> ttm_bo_release+0x207/0x320 [amdttm] amdttm_bo_init_reserved+0x1d6/0x210 [amdttm] amdgpu_bo_create+0x1ba/0x520 [amdgpu] amdgpu_bo_create_vm+0x3a/0x80 [amdgpu] amdgpu_vm_pt_create+0xde/0x270 [amdgpu] amdgpu_vm_ptes_update+0x63b/0x710 [amdgpu] amdgpu_vm_update_range+0x2e7/0x6e0 [amdgpu] amdgpu_vm_bo_update+0x2bd/0x600 [amdgpu] update_gpuvm_pte+0x160/0x420 [amdgpu] amdgpu_amdkfd_gpuvm_map_memory_to_gpu+0x313/0x1130 [amdgpu] kfd_ioctl_map_memory_to_gpu+0x115/0x390 [amdgpu] kfd_ioctl+0x24a/0x5b0 [amdgpu] Signed-off-by:
Philip Yang <Philip.Yang@amd.com> Reviewed-by:
Christian König <christian.koenig@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
- Jul 14, 2022
-
-
Leo Li authored
When pinning a buffer, we should check to see if there are any additional restrictions imposed by bo->preferred_domains. This will prevent the BO from being moved to an invalid domain when pinning. For example, this can happen if the user requests to create a BO in GTT domain for display scanout. amdgpu_dm will allow pinning to either VRAM or GTT domains, since DCN can scanout from either or. However, in amdgpu_bo_pin_restricted(), pinning to VRAM is preferred if there is adequate carveout. This can lead to pinning to VRAM despite the user requesting GTT placement for the BO. v2: Allow the kernel to override the domain, which can happen when exporting a BO to a V4L camera (for example). Signed-off-by:
Leo Li <sunpeng.li@amd.com> Reviewed-by:
Alex Deucher <alexander.deucher@amd.com> Reviewed-by:
Christian König <christian.koenig@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
-
- Jul 11, 2022
-
-
Christian König authored
Make sure we can at least move and release BOs without backing store. Signed-off-by:
Christian König <christian.koenig@amd.com> Reviewed-by:
Michael J. Ruhl <michael.j.ruhl@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220707102453.3633-3-christian.koenig@amd.com
-
Christian König authored
Rename ttm_bo_init to ttm_bo_init_validate since that better matches what the function is actually doing. Remove the unused size parameter, move the function's kerneldoc to the implementation and cleanup the whole error handling. Signed-off-by:
Christian König <christian.koenig@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220707102453.3633-2-christian.koenig@amd.com Reviewed-by:
Michael J. Ruhl <michael.j.ruhl@intel.com>
-
- May 26, 2022
-
-
Christian König authored
Add a AMDGPU_GEM_CREATE_DISCARDABLE flag to note that the content of a BO doesn't needs to be preserved during eviction. KFD was already using a similar functionality for SVM BOs so replace the internal flag with the new UAPI. Signed-off-by:
Christian König <christian.koenig@amd.com> Reviewed-by:
Alex Deucher <alexander.deucher@amd.com> Reviewed-by:
Marek Olšák <marek.olsak@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
Alex Deucher authored
Some applications want to know whether the memory is LP or not. Reviewed-by:
Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
- Apr 07, 2022
-
-
Christian König authored
This is now handled by the DMA-buf framework in the dma_resv obj. Also remove the workaround inside VMWGFX to update the moving fence. Signed-off-by:
Christian König <christian.koenig@amd.com> Reviewed-by:
Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20220407085946.744568-14-christian.koenig@amd.com
-
Christian König authored
Wait only for kernel fences before kmap or UVD direct submission. This also makes sure that we always wait in amdgpu_bo_kmap() even when returning a cached pointer. Signed-off-by:
Christian König <christian.koenig@amd.com> Acked-by:
Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20220407085946.744568-6-christian.koenig@amd.com
-
Christian König authored
Instead of distingting between shared and exclusive fences specify the fence usage while adding fences. Rework all drivers to use this interface instead and deprecate the old one. v2: some kerneldoc comments suggested by Daniel v3: fix a missing case in radeon v4: rebase on nouveau changes, fix lockdep and temporary disable warning v5: more documentation updates v6: separate internal dma_resv changes from this patch, avoids to disable warning temporary, rebase on upstream changes v7: fix missed case in lima driver, minimize changes to i915_gem_busy_ioctl Signed-off-by:
Christian König <christian.koenig@amd.com> Reviewed-by:
Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20220407085946.744568-3-christian.koenig@amd.com
-
Christian König authored
This change adds the dma_resv_usage enum and allows us to specify why a dma_resv object is queried for its containing fences. Additional to that a dma_resv_usage_rw() helper function is added to aid retrieving the fences for a read or write userspace submission. This is then deployed to the different query functions of the dma_resv object and all of their users. When the write paratermer was previously true we now use DMA_RESV_USAGE_WRITE and DMA_RESV_USAGE_READ otherwise. v2: add KERNEL/OTHER in separate patch v3: some kerneldoc suggestions by Daniel v4: some more kerneldoc suggestions by Daniel, fix missing cases lost in the rebase pointed out by Bas. Signed-off-by:
Christian König <christian.koenig@amd.com> Reviewed-by:
Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20220407085946.744568-2-christian.koenig@amd.com
-
- Apr 06, 2022
-
-
Christian König authored
Audit all the users of dma_resv_add_excl_fence() and make sure they reserve a shared slot also when only trying to add an exclusive fence. This is the next step towards handling the exclusive fence like a shared one. v2: fix missed case in amdgpu v3: and two more radeon, rename function v4: add one more case to TTM, fix i915 after rebase Signed-off-by:
Christian König <christian.koenig@amd.com> Reviewed-by:
Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20220406075132.3263-2-christian.koenig@amd.com
-
- Apr 01, 2022
-
-
Christian König authored
That are bytes not pages. Signed-off-by:
Christian König <christian.koenig@amd.com> Acked-by:
Alex Deucher <alexander.deucher@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
- Mar 25, 2022
-
-
Guchun Chen authored
On GPUs with RAS enabled, below call trace is observed when suspending or shutting down device. The cause is we have enabled memory wipe flag for BOs on such GPUs by default, and such BOs will go to memory wipe by amdgpu_fill_buffer, however, because ring is off already, it fails to clean up the memory and throw this error message. So add a suspend/shutdown check before wipping memory. [drm:amdgpu_fill_buffer [amdgpu]] *ERROR* Trying to clear memory with ring turned off. v2: fix coding style issue Fixes: fc6ea4be ("drm/amdgpu: Wipe all VRAM on free when RAS is enabled") Signed-off-by:
Guchun Chen <guchun.chen@amd.com> Reviewed-by:
Christian König <christian.koenig@amd.com> Acked-by:
Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
- Feb 14, 2022
-
-
Christian König authored
This is provided by TTM now. Also switch man->size to bytes instead of pages and fix the double printing of size and usage in debugfs. v2: fix size checking as well Signed-off-by:
Christian König <christian.koenig@amd.com> Tested-by:
Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by:
Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220214093439.2989-8-christian.koenig@amd.com
-
Christian König authored
This is provided by TTM now. Also switch man->size to bytes instead of pages and fix the double printing of size and usage in debugfs. v2: fix size checking as well Signed-off-by:
Christian König <christian.koenig@amd.com> Tested-by:
Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by:
Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220214093439.2989-6-christian.koenig@amd.com
-
- Feb 07, 2022
-
-
Rajneesh Bhardwaj authored
Noticed the below warning while running a pytorch workload on vega10 GPUs. Change to trylock to avoid conflicts with already held reservation locks. [ +0.000003] WARNING: possible recursive locking detected [ +0.000003] 5.13.0-kfd-rajneesh #1030 Not tainted [ +0.000004] -------------------------------------------- [ +0.000002] python/4822 is trying to acquire lock: [ +0.000004] ffff932cd9a259f8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: amdgpu_bo_release_notify+0xc4/0x160 [amdgpu] [ +0.000203] but task is already holding lock: [ +0.000003] ffff932cbb7181f8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: ttm_eu_reserve_buffers+0x270/0x470 [ttm] [ +0.000017] other info that might help us debug this: [ +0.000002] Possible unsafe locking scenario: [ +0.000003] CPU0 [ +0.000002] ---- [ +0.000002] lock(reservation_ww_class_mutex); [ +0.000004] lock(reservation_ww_class_mutex); [ +0.000003] *** DEADLOCK *** [ +0.000002] May be due to missing lock nesting notation [ +0.000003] 7 locks held by python/4822: [ +0.000003] #0: ffff932c4ac028d0 (&process->mutex){+.+.}-{3:3}, at: kfd_ioctl_map_memory_to_gpu+0x10b/0x320 [amdgpu] [ +0.000232] #1: ffff932c55e830a8 (&info->lock#2){+.+.}-{3:3}, at: amdgpu_amdkfd_gpuvm_map_memory_to_gpu+0x64/0xf60 [amdgpu] [ +0.000241] #2: ffff932cc45b5e68 (&(*mem)->lock){+.+.}-{3:3}, at: amdgpu_amdkfd_gpuvm_map_memory_to_gpu+0xdf/0xf60 [amdgpu] [ +0.000236] #3: ffffb2b35606fd28 (reservation_ww_class_acquire){+.+.}-{0:0}, at: amdgpu_amdkfd_gpuvm_map_memory_to_gpu+0x232/0xf60 [amdgpu] [ +0.000235] #4: ffff932cbb7181f8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: ttm_eu_reserve_buffers+0x270/0x470 [ttm] [ +0.000015] #5: ffffffffc045f700 (*(sspp++)){....}-{0:0}, at: drm_dev_enter+0x5/0xa0 [drm] [ +0.000038] #6: ffff932c52da7078 (&vm->eviction_lock){+.+.}-{3:3}, at: amdgpu_vm_bo_update_mapping+0xd5/0x4f0 [amdgpu] [ +0.000195] stack backtrace: [ +0.000003] CPU: 11 PID: 4822 Comm: python Not tainted 5.13.0-kfd-rajneesh #1030 [ +0.000005] Hardware name: GIGABYTE MZ01-CE0-00/MZ01-CE0-00, BIOS F02 08/29/2018 [ +0.000003] Call Trace: [ +0.000003] dump_stack+0x6d/0x89 [ +0.000010] __lock_acquire+0xb93/0x1a90 [ +0.000009] lock_acquire+0x25d/0x2d0 [ +0.000005] ? amdgpu_bo_release_notify+0xc4/0x160 [amdgpu] [ +0.000184] ? lock_is_held_type+0xa2/0x110 [ +0.000006] ? amdgpu_bo_release_notify+0xc4/0x160 [amdgpu] [ +0.000184] __ww_mutex_lock.constprop.17+0xca/0x1060 [ +0.000007] ? amdgpu_bo_release_notify+0xc4/0x160 [amdgpu] [ +0.000183] ? lock_release+0x13f/0x270 [ +0.000005] ? lock_is_held_type+0xa2/0x110 [ +0.000006] ? amdgpu_bo_release_notify+0xc4/0x160 [amdgpu] [ +0.000183] amdgpu_bo_release_notify+0xc4/0x160 [amdgpu] [ +0.000185] ttm_bo_release+0x4c6/0x580 [ttm] [ +0.000010] amdgpu_bo_unref+0x1a/0x30 [amdgpu] [ +0.000183] amdgpu_vm_free_table+0x76/0xa0 [amdgpu] [ +0.000189] amdgpu_vm_free_pts+0xb8/0xf0 [amdgpu] [ +0.000189] amdgpu_vm_update_ptes+0x411/0x770 [amdgpu] [ +0.000191] amdgpu_vm_bo_update_mapping+0x324/0x4f0 [amdgpu] [ +0.000191] amdgpu_vm_bo_update+0x251/0x610 [amdgpu] [ +0.000191] update_gpuvm_pte+0xcc/0x290 [amdgpu] [ +0.000229] ? amdgpu_vm_bo_map+0xd7/0x130 [amdgpu] [ +0.000190] amdgpu_amdkfd_gpuvm_map_memory_to_gpu+0x912/0xf60 [amdgpu] [ +0.000234] kfd_ioctl_map_memory_to_gpu+0x182/0x320 [amdgpu] [ +0.000218] kfd_ioctl+0x2b9/0x600 [amdgpu] [ +0.000216] ? kfd_ioctl_unmap_memory_from_gpu+0x270/0x270 [amdgpu] [ +0.000216] ? lock_release+0x13f/0x270 [ +0.000006] ? __fget_files+0x107/0x1e0 [ +0.000007] __x64_sys_ioctl+0x8b/0xd0 [ +0.000007] do_syscall_64+0x36/0x70 [ +0.000004] entry_SYSCALL_64_after_hwframe+0x44/0xae [ +0.000007] RIP: 0033:0x7fbff90a7317 [ +0.000004] Code: b3 66 90 48 8b 05 71 4b 2d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 41 4b 2d 00 f7 d8 64 89 01 48 [ +0.000005] RSP: 002b:00007fbe301fe648 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ +0.000006] RAX: ffffffffffffffda RBX: 00007fbcc402d820 RCX: 00007fbff90a7317 [ +0.000003] RDX: 00007fbe301fe690 RSI: 00000000c0184b18 RDI: 0000000000000004 [ +0.000003] RBP: 00007fbe301fe690 R08: 0000000000000000 R09: 00007fbcc402d880 [ +0.000003] R10: 0000000002001000 R11: 0000000000000246 R12: 00000000c0184b18 [ +0.000003] R13: 0000000000000004 R14: 00007fbf689593a0 R15: 00007fbcc402d820 Cc: Christian König <christian.koenig@amd.com> Cc: Felix Kuehling <Felix.Kuehling@amd.com> Cc: Alex Deucher <Alexander.Deucher@amd.com> Reviewed-by:
Christian König <christian.koenig@amd.com> Reviewed-by:
Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by:
Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
- Jan 27, 2022
-
-
Felix Kuehling authored
On GPUs with RAS, poison can propagate between processes if VRAM is not cleared when it is freed or allocated. The reason is, that not all write accesses clear RAS poison. 32-byte writes by the SDMA engine do clear RAS poison. Clearing memory in the background when it is freed should avoid major performance impact. KFD has been doing this already for a long time. Signed-off-by:
Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by:
Christian König <christian.koenig@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
- Jan 11, 2022
-
-
Leslie Shi authored
Patch: 3efb17ae ("drm/amdgpu: Call amdgpu_device_unmap_mmio() if device is unplugged to prevent crash in GPU initialization failure") makes call to amdgpu_device_unmap_mmio() conditioned on device unplugged. This patch unmaps MMIO mappings even when device is not unplugged. v2: Add condition of drm_dev_enter() to deleted unmaps in patch "drm/amdgpu: Unmap all MMIO mappings" Signed-off-by:
Leslie Shi <Yuliang.Shi@amd.com> Reviewed-by:
Andrey Grodzovsky <andrey.grodzovsky@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
- Nov 17, 2021
-
-
Nirmoy Das authored
We set WC memtype for aper_base but don't check return value of arch_io_reserve_memtype_wc(). Be more defensive and return early on error. Signed-off-by:
Nirmoy Das <nirmoy.das@amd.com> Reviewed-by:
Christian König <christian.koenig@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
- Nov 05, 2021
-
-
Felix Kuehling authored
If a kfd_bo was shared (e.g. a dmabuf export), the original kfd_bo may be freed when the amdgpu_bo still lives on. Free the kfd_bo struct in the release_notify callback then the amdgpu_bo is freed. Signed-off-by:
Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-By:
Ramesh Errabolu <Ramesh.Errabolu@amd.com> Reviewed-by:
Christian König <christian.koenig@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
- Oct 07, 2021
-
-
Nirmoy Das authored
Unify BO evicting functionality for possible memory types in amdgpu_ttm.c. Signed-off-by:
Nirmoy Das <nirmoy.das@amd.com> Reviewed-by:
Christian König <christian.koenig@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
- Sep 07, 2021
-
-
Christian König authored
Just drop some dead code. Signed-off-by:
Christian König <christian.koenig@amd.com> Reviewed-by:
Nirmoy Das <nirmoy.das@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
- Aug 25, 2021
-
-
zhang yifan authored
amdgpu_bo_get_preferred_pin_domain is used for page tables creation, which is not involved with page pinning. And it is used in more cases than display scanout, modify its documentation as well. Signed-off-by:
Yifan Zhang <yifan1.zhang@amd.com> Reviewed-by:
Christian König <christian.koenig@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-