1. 16 Sep, 2021 3 commits
    • Andrey Grodzovsky's avatar
      drm/amdgpu: Fix crash on device remove/driver unload · cdccf1ff
      Andrey Grodzovsky authored
      Crash:
      BUG: unable to handle page fault for address: 00000000000010e1
      RIP: 0010:vega10_power_gate_vce+0x26/0x50 [amdgpu]
      Call Trace:
      pp_set_powergating_by_smu+0x16a/0x2b0 [amdgpu]
      amdgpu_dpm_set_powergating_by_smu+0x92/0xf0 [amdgpu]
      amdgpu_dpm_enable_vce+0x2e/0xc0 [amdgpu]
      vce_v4_0_hw_fini+0x95/0xa0 [amdgpu]
      amdgpu_device_fini_hw+0x232/0x30d [amdgpu]
      amdgpu_driver_unload_kms+0x5c/0x80 [amdgpu]
      amdgpu_pci_remove+0x27/0x40 [amdgpu]
      pci_device_remove+0x3e/0xb0
      device_release_driver_internal+0x103/0x1d0
      device_release_driver+0x12/0x20
      pci_stop_bus_device+0x79/0xa0
      pci_stop_and_remove_bus_device_locked+0x1b/0x30
      remove_store+0x7b/0x90
      dev_attr_store+0x17/0x30
      sysfs_kf_write+0x4b/0x60
      kernfs_fop_write_iter+0x151/0x1e0
      
      Why:
      VCE/UVD had dependency on SMC block for their suspend but
      SMC block is the first to do HW fini due to some constraints
      
      How:
      Since the original patch was dealing with suspend issues
      move the SMC block dependency back into suspend hooks as
      was done in V1 of the original patches.
      Keep flushing idle work both in suspend and HW fini seuqnces
      since it's essential in both cases.
      
      Fixes:
      2178d3c1 drm/amdgpu: add missing cleanups for more ASICs on UVD/VCE suspend
      ee6679aa
      
       drm/amdgpu: add missing cleanups for Polaris12 UVD/VCE on suspend
      Signed-off-by: Andrey Grodzovsky's avatarAndrey Grodzovsky <andrey.grodzovsky@amd.com>
      cdccf1ff
    • xinhui pan's avatar
      drm/amdgpu: Fix uvd ib test timeout when use pre-allocated BO · 4567162f
      xinhui pan authored
      
      
      Now we use same BO for create/destroy msg. So destroy will wait for the
      fence returned from create to be signaled. The default timeout value in
      destroy is 10ms which is too short.
      
      Lets wait both fences with the specific timeout.
      Signed-off-by: default avatarxinhui pan <xinhui.pan@amd.com>
      Reviewed-by: Christian König's avatarChristian König <christian.koenig@amd.com>
      4567162f
    • xinhui pan's avatar
      drm/amdgpu: Put drm_dev_enter/exit outside hot codepath · 7ba5400f
      xinhui pan authored
      
      
      We hit soft hang while doing memory pressure test on one numa system.
      After a qucik look, this is because kfd invalid/valid userptr memory
      frequently with process_info lock hold.
      Looks like update page table mapping use too much cpu time.
      
      perf top says below,
      75.81%  [kernel]       [k] __srcu_read_unlock
       6.19%  [amdgpu]       [k] amdgpu_gmc_set_pte_pde
       3.56%  [kernel]       [k] __srcu_read_lock
       2.20%  [amdgpu]       [k] amdgpu_vm_cpu_update
       2.20%  [kernel]       [k] __sg_page_iter_dma_next
       2.15%  [drm]          [k] drm_dev_enter
       1.70%  [drm]          [k] drm_prime_sg_to_dma_addr_array
       1.18%  [kernel]       [k] __sg_alloc_table_from_pages
       1.09%  [drm]          [k] drm_dev_exit
      
      So move drm_dev_enter/exit outside gmc code, instead let caller do it.
      They are gart_unbind, gart_map, vm_clear_bo, vm_update_pdes and
      gmc_init_pdb0. vm_bo_update_mapping already calls it.
      Signed-off-by: default avatarxinhui pan <xinhui.pan@amd.com>
      Reviewed-and-tested-by: Andrey G...
      7ba5400f
  2. 15 Sep, 2021 8 commits
  3. 14 Sep, 2021 16 commits
  4. 13 Sep, 2021 8 commits
  5. 10 Sep, 2021 5 commits