Buddy Allocator causing issues with vulkan loads on 6800M (PRIME)
I was testing out GravityMark (https://tellusim.com/download/GravityMark_1.53.run) and noticed issues on @airlied drm-next tree, the bench mark doesn't launch and freezes up the gui until the GravityMark.x64 binary is killed
This seems to happen for a lot of other vulkan apps too
I bisected back to:
c9cad937c0c58618fe5b0310fd539a854dc1ae95 is the first bad commit
commit c9cad937c0c58618fe5b0310fd539a854dc1ae95
Author: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Date: Fri Apr 8 04:18:43 2022 +0530
drm/amdgpu: add drm buddy support to amdgpu
- Switch to drm buddy allocator
- Add resource cursor support for drm buddy
v2(Matthew Auld):
- replace spinlock with mutex as we call kmem_cache_zalloc
(..., GFP_KERNEL) in drm_buddy_alloc() function
- lock drm_buddy_block_trim() function as it calls
mark_free/mark_split are all globally visible
v3(Matthew Auld):
- remove trim method error handling as we address the failure case
at drm_buddy_block_trim() function
v4:
- fix warnings reported by kernel test robot <lkp@intel.com>
v5:
- fix merge conflict issue
v6:
- fix warnings reported by kernel test robot <lkp@intel.com>
v7:
- remove DRM_BUDDY_RANGE_ALLOCATION flag usage
v8:
- keep DRM_BUDDY_RANGE_ALLOCATION flag usage
- resolve conflicts created by drm/amdgpu: remove VRAM accounting v2
v9(Christian):
- merged the below patch
- drm/amdgpu: move vram inline functions into a header
- rename label name as fallback
- move struct amdgpu_vram_mgr to amdgpu_vram_mgr.h
- remove unnecessary flags from struct amdgpu_vram_reservation
- rewrite block NULL check condition
- change else style as per coding standard
- rewrite the node max size
- add a helper function to fetch the first entry from the list
v10(Christian):
- rename amdgpu_get_node() function name as amdgpu_vram_mgr_first_block
v11:
- if size is not aligned with min_page_size, enable is_contiguous flag,
therefore, the size round up to the power of two and trimmed to the
original size.
v12:
- rename the function names having prefix as amdgpu_vram_mgr_*()
- modify the round_up() logic conforming to contiguous flag enablement
or if size is not aligned to min_block_size
- modify the trim logic
- rename node as block wherever applicable
Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220407224843.2416-1-Arunpravin.PaneerSelvam@amd.com
Signed-off-by: Christian König <christian.koenig@amd.com>
drivers/gpu/drm/Kconfig | 1 +
drivers/gpu/drm/amd/amdgpu/amdgpu_res_cursor.h | 97 +++++--
drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h | 10 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 359 +++++++++++++++----------
drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.h | 89 ++++++
5 files changed, 380 insertions(+), 176 deletions(-)
create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.h
Reverting that commit gets things working again
Both the kernel and mesa (latest git) are compiled with clang 14.0.1 and linked with lld 14.0.1
Is there any chance a PRIME setup could be added to AMD's CI? It seems it gets broken with each kernel cycle