Skip to content
  • Marek Olšák's avatar
    winsys/amdgpu: fix a deadlock when waiting for submission_in_progress · 58af1f6b
    Marek Olšák authored
    First this happens:
    
    1) amdgpu_cs_flush (lock bo_fence_lock)
       -> amdgpu_add_fence_dependency
       -> os_wait_until_zero (wait for submission_in_progress) - WAITING
    
    2) amdgpu_bo_create
       -> pb_cache_reclaim_buffer (lock pb_cache::mutex)
       -> pb_cache_is_buffer_compat
       -> amdgpu_bo_wait (lock bo_fence_lock) - WAITING
    
    So both bo_fence_lock and pb_cache::mutex are held. amdgpu_bo_create can't
    continue. amdgpu_cs_flush is waiting for the CS ioctl to finish the job,
    but the CS ioctl is trying to release a buffer:
    
    3) amdgpu_cs_submit_ib (CS thread - job entrypoint)
       -> amdgpu_cs_context_cleanup
       -> pb_reference
       -> pb_destroy
       -> amdgpu_bo_destroy_or_cache
       -> pb_cache_add_buffer (lock pb_cache::mutex) - DEADLOCK
    
    The simple solution is not to wait for submission_in_progress, which we
    need in order to create the list of dependencies for the CS ioctl. Instead
    of building the list of dependencies as a direct input to the CS ioctl,
    build the list of dependencies as a list of fences, and make the final list
    of dependencies in the CS thread itself.
    
    Therefore, amdgpu_cs_flush doesn't have to wait and can continue.
    Then, amdgpu_bo_create can continue and return. And then amdgpu_cs_submit_ib
    can continue.
    
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101294
    
    
    
    Cc: 17.1 <mesa-stable@lists.freedesktop.org>
    Reviewed-by: default avatarNicolai Hähnle <nicolai.haehnle@amd.com>
    58af1f6b