Skip to content
Snippets Groups Projects
  1. Jul 26, 2022
    • Thomas Hellström's avatar
      drm/i915: Fix vm use-after-free in vma destruction · 58d7ab3f
      Thomas Hellström authored and Tomeu Vizoso's avatar Tomeu Vizoso committed
      In vma destruction, the following race may occur:
      
      Thread 1:	    		  Thread 2:
      i915_vma_destroy();
      
        ...
        list_del_init(vma->vm_link);
        ...
        mutex_unlock(vma->vm->mutex);
      				  __i915_vm_release();
      release_references();
      
      And in release_reference() we dereference vma->vm to get to the
      vm gt pointer, leading to a use-after free.
      
      However, __i915_vm_release() grabs the vm->mutex so the vm won't be
      destroyed before vma->vm->mutex is released, so extract the gt pointer
      under the vm->mutex to avoid the vma->vm dereference in
      release_references().
      
      v2: Fix a typo in the commit message (Andi Shyti)
      
      Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/5944
      
      
      Fixes: e1a7ab4f ("drm/i915: Remove the vm open count")
      
      Cc: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
      Cc: Matthew Auld <matthew.auld@intel.com>
      Signed-off-by: default avatarThomas Hellström <thomas.hellstrom@linux.intel.com>
      Acked-by: default avatarNirmoy Das <nirmoy.das@intel.con>
      Reviewed-by: default avatarAndrzej Hajda <andrzej.hajda@intel.com>
      Reviewed-by: default avatarMatthew Auld <matthew.auld@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20220620123659.381772-1-thomas.hellstrom@linux.intel.com
      58d7ab3f
    • Matthew Auld's avatar
      drm/i915/ttm: handle blitter failure on DG2 · fa943357
      Matthew Auld authored and Tomeu Vizoso's avatar Tomeu Vizoso committed
      
      If the move or clear operation somehow fails, and the memory underneath
      is not cleared, like when moving to lmem, then we currently fallback to
      memcpy or memset. However with small-BAR systems this fallback might no
      longer be possible. For now we use the set_wedged sledgehammer if we
      ever encounter such a scenario, and mark the object as borked to plug
      any holes where access to the memory underneath can happen. Add some
      basic selftests to exercise this.
      
      v2:
        - In the selftests make sure we grab the runtime pm around the reset.
          Also make sure we grab the reset lock before checking if the device
          is wedged, since the wedge might still be in-progress and hence the
          bit might not be set yet.
        - Don't wedge or put the object into an unknown state, if the request
          construction fails (or similar). Just returning an error and
          skipping the fallback should be safe here.
        - Make sure we wedge each gt. (Thomas)
        - Peek at the unknown_state in io_reserve, that way we don't have to
          export or hand roll the fault_wait_for_idle. (Thomas)
        - Add the missing read-side barriers for the unknown_state. (Thomas)
        - Some kernel-doc fixes. (Thomas)
      v3:
        - Tweak the ordering of the set_wedged, also add FIXME.
      
      Signed-off-by: default avatarMatthew Auld <matthew.auld@intel.com>
      Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
      Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
      Cc: Jon Bloomfield <jon.bloomfield@intel.com>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: Jordan Justen <jordan.l.justen@intel.com>
      Cc: Kenneth Graunke <kenneth@whitecape.org>
      Cc: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
      Reviewed-by: default avatarThomas Hellström <thomas.hellstrom@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20220629174350.384910-11-matthew.auld@intel.com
      fa943357
  2. Jun 22, 2022
  3. Jun 13, 2022
  4. May 27, 2022
  5. May 09, 2022
  6. May 04, 2022
  7. May 03, 2022
  8. Apr 07, 2022
  9. Apr 06, 2022
  10. Mar 07, 2022
  11. Feb 28, 2022
    • Thomas Hellström's avatar
      drm/i915: Clarify vma lifetime · c03d9826
      Thomas Hellström authored
      
      It's unclear what reference the initial vma kref reference refers to.
      A vma can have multiple weak references, the object vma list,
      the vm's bound list and the GT's closed_list, and the initial vma
      reference can be put from lookups of all these lists.
      
      With the current implementation this means
      that any holder of yet another vma refcount (currently only
      i915_gem_object_unbind()) needs to be holding two of either
      *) An object refcount,
      *) A vm open count
      *) A vma open count
      
      in order for us to not risk leaking a reference by having the
      initial vma reference being put twice.
      
      Address this by re-introducing i915_vma_destroy() which removes all
      weak references of the vma and *then* puts the initial vma refcount.
      This makes a strong vma reference hold on to the vma unconditionally.
      
      Perhaps a better name would be i915_vma_revoke() or i915_vma_zombify(),
      since other callers may still hold a refcount, but with the prospect of
      being able to replace the vma refcount with the object lock in the near
      future, let's stick with i915_vma_destroy().
      
      Finally this commit fixes a race in that previously i915_vma_release() and
      now i915_vma_destroy() could destroy a vma without taking the vm->mutex
      after an advisory check that the vma mm_node was not allocated.
      This would race with the ungrab_vma() function creating a trace similar
      to the below one. This was fixed in one of the __i915_vma_put() callsites
      in
      commit bc1922e5 ("drm/i915: Fix a race between vma / object destruction and unbinding")
      but although not seemingly triggered by CI, that
      is not sufficient. This patch is needed to fix that properly.
      
      [823.012188] Console: switching to colour dummy device 80x25
      [823.012422] [IGT] gem_ppgtt: executing
      [823.016667] [IGT] gem_ppgtt: starting subtest blt-vs-render-ctx0
      [852.436465] stack segment: 0000 [#1] PREEMPT SMP NOPTI
      [852.436480] CPU: 0 PID: 3200 Comm: gem_ppgtt Not tainted 5.16.0-CI-CI_DRM_11115+ #1
      [852.436489] Hardware name: Intel Corporation Alder Lake Client Platform/AlderLake-P DDR5 RVP, BIOS ADLPFWI1.R00.2422.A00.2110131104 10/13/2021
      [852.436499] RIP: 0010:ungrab_vma+0x9/0x80 [i915]
      [852.436711] Code: ef e8 4b 85 cf e0 e8 36 a3 d6 e0 8b 83 f8 9c 00 00 85 c0 75 e1 5b 5d 41 5c 41 5d c3 e9 d6 fd 14 00 55 53 48 8b af c0 00 00 00 <8b> 45 00 85 c0 75 03 5b 5d c3 48 8b 85 a0 02 00 00 48 89 fb 48 8b
      [852.436727] RSP: 0018:ffffc90006db7880 EFLAGS: 00010246
      [852.436734] RAX: 0000000000000000 RBX: ffffc90006db7598 RCX: 0000000000000000
      [852.436742] RDX: ffff88815349e898 RSI: ffff88815349e858 RDI: ffff88810a284140
      [852.436748] RBP: 6b6b6b6b6b6b6b6b R08: ffff88815349e898 R09: ffff88815349e8e8
      [852.436754] R10: 0000000000000001 R11: 0000000051ef1141 R12: ffff88810a284140
      [852.436762] R13: 0000000000000000 R14: ffff88815349e868 R15: ffff88810a284458
      [852.436770] FS:  00007f5c04b04e40(0000) GS:ffff88849f000000(0000) knlGS:0000000000000000
      [852.436781] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [852.436788] CR2: 00007f5c04b38fe0 CR3: 000000010a6e8001 CR4: 0000000000770ef0
      [852.436797] PKRU: 55555554
      [852.436801] Call Trace:
      [852.436806]  <TASK>
      [852.436811]  i915_gem_evict_for_node+0x33c/0x3c0 [i915]
      [852.437014]  i915_gem_gtt_reserve+0x106/0x130 [i915]
      [852.437211]  i915_vma_pin_ww+0x8f4/0xb60 [i915]
      [852.437412]  eb_validate_vmas+0x688/0x860 [i915]
      [852.437596]  i915_gem_do_execbuffer+0xc0e/0x25b0 [i915]
      [852.437770]  ? deactivate_slab+0x5f2/0x7d0
      [852.437778]  ? _raw_spin_unlock_irqrestore+0x50/0x60
      [852.437789]  ? i915_gem_execbuffer2_ioctl+0xc6/0x2c0 [i915]
      [852.437944]  ? init_object+0x49/0x80
      [852.437950]  ? __lock_acquire+0x5e6/0x2580
      [852.437963]  i915_gem_execbuffer2_ioctl+0x116/0x2c0 [i915]
      [852.438129]  ? i915_gem_do_execbuffer+0x25b0/0x25b0 [i915]
      [852.438300]  drm_ioctl_kernel+0xac/0x140
      [852.438310]  drm_ioctl+0x201/0x3d0
      [852.438316]  ? i915_gem_do_execbuffer+0x25b0/0x25b0 [i915]
      [852.438490]  __x64_sys_ioctl+0x6a/0xa0
      [852.438498]  do_syscall_64+0x37/0xb0
      [852.438507]  entry_SYSCALL_64_after_hwframe+0x44/0xae
      [852.438515] RIP: 0033:0x7f5c0415b317
      [852.438523] Code: b3 66 90 48 8b 05 71 4b 2d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 41 4b 2d 00 f7 d8 64 89 01 48
      [852.438542] RSP: 002b:00007ffd765039a8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
      [852.438553] RAX: ffffffffffffffda RBX: 000055e4d7829dd0 RCX: 00007f5c0415b317
      [852.438562] RDX: 00007ffd76503a00 RSI: 00000000c0406469 RDI: 0000000000000017
      [852.438571] RBP: 00007ffd76503a00 R08: 0000000000000000 R09: 0000000000000081
      [852.438579] R10: 00000000ffffff7f R11: 0000000000000246 R12: 00000000c0406469
      [852.438587] R13: 0000000000000017 R14: 00007ffd76503a00 R15: 0000000000000000
      [852.438598]  </TASK>
      [852.438602] Modules linked in: snd_hda_codec_hdmi i915 mei_hdcp x86_pkg_temp_thermal snd_hda_intel snd_intel_dspcfg drm_buddy coretemp crct10dif_pclmul crc32_pclmul snd_hda_codec ttm ghash_clmulni_intel snd_hwdep snd_hda_core e1000e drm_dp_helper ptp snd_pcm mei_me drm_kms_helper pps_core mei syscopyarea sysfillrect sysimgblt fb_sys_fops prime_numbers intel_lpss_pci smsc75xx usbnet mii
      [852.440310] ---[ end trace e52cdd2fe4fd911c ]---
      
      v2: Fix typos in the commit message.
      
      Fixes: 7e00897b ("drm/i915: Add object locking to i915_gem_evict_for_node and i915_gem_evict_something, v2.")
      Fixes: bc1922e5 ("drm/i915: Fix a race between vma / object destruction and unbinding")
      Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
      Signed-off-by: default avatarThomas Hellström <thomas.hellstrom@linux.intel.com>
      Reviewed-by: default avatarMatthew Auld <matthew.auld@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20220222133209.587978-1-thomas.hellstrom@linux.intel.com
      c03d9826
    • Matthew Auld's avatar
      drm/i915: add I915_BO_ALLOC_GPU_ONLY · 30b9d1b3
      Matthew Auld authored
      
      If the user doesn't require CPU access for the buffer, then
      ALLOC_GPU_ONLY should be used, in order to prioritise allocating in the
      non-mappable portion of LMEM, on devices with small BAR.
      
      v2(Thomas):
        - The BO_ALLOC_TOPDOWN naming here is poor, since this is pure lies on
          systems that don't even have small BAR. A better name is GPU_ONLY,
          which is accurate regardless of the configuration.
      
      Signed-off-by: default avatarMatthew Auld <matthew.auld@intel.com>
      Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
      Reviewed-by: default avatarThomas Hellström <thomas.hellstrom@linux.intel.com>
      Acked-by: default avatarNirmoy Das <nirmoy.das@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20220225145502.331818-3-matthew.auld@intel.com
      30b9d1b3
  12. Feb 20, 2022
  13. Jan 28, 2022
  14. Jan 27, 2022
  15. Jan 26, 2022
  16. Jan 25, 2022
    • Tvrtko Ursulin's avatar
      drm/i915: Flush TLBs before releasing backing store · 7938d615
      Tvrtko Ursulin authored
      
      We need to flush TLBs before releasing backing store otherwise userspace
      is able to encounter stale entries if a) it is not declaring access to
      certain buffers and b) it races with the backing store release from a
      such undeclared execution already executing on the GPU in parallel.
      
      The approach taken is to mark any buffer objects which were ever bound
      to the GPU and to trigger a serialized TLB flush when their backing
      store is released.
      
      Alternatively the flushing could be done on VMA unbind, at which point
      we would be able to ascertain whether there is potential a parallel GPU
      execution (which could race), but essentially it boils down to paying
      the cost of TLB flushes potentially needlessly at VMA unbind time (when
      the backing store is not known to be going away so not needed for
      safety), versus potentially needlessly at backing store relase time
      (since we at that point cannot tell whether there is anything executing
      on the GPU which uses that object).
      
      Thereforce simplicity of implementation has been chosen for now with
      scope to benchmark and refine later as required.
      
      Signed-off-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reported-by: default avatarSushma Venkatesh Reddy <sushma.venkatesh.reddy@intel.com>
      Reviewed-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      Acked-by: default avatarDave Airlie <airlied@redhat.com>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: Jon Bloomfield <jon.bloomfield@intel.com>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Jani Nikula <jani.nikula@intel.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7938d615
  17. Jan 18, 2022
  18. Jan 11, 2022
  19. Jan 10, 2022
  20. Dec 22, 2021
  21. Dec 20, 2021
Loading