1. 03 Aug, 2021 1 commit
    • Juan A. Suárez's avatar
      v3d: attach performance monitor to jobs · 99697035
      Juan A. Suárez authored
      
      
      When a performance monitor is enabled in the context, all the jobs
      submitted to the kernel will have attached this monitor ID, so the
      kernel will measuring the performance counters selected in the monitor
      when these jobs are executed by the GPU (accumulating the results).
      
      v2 (Iago):
       - Update comment
       - Assert fence is not NULL
       - Assert has_perfmon when using perfmon
       - Rewrite conditional
       - Implement performance counters in CSD
      
      v4 (Juan):
       - Track previous perfmon and sync BCL if required (Juan).
       - Track if a job with perfmon was submitted (Juan)
      
      v7 (Iago)
       - No braces for single-line body conditionals
      Reviewed-by: Iago Toral's avatarIago Toral Quiroga <itoral@igalia.com>
      Signed-off-by: Juan A. Suárez's avatarJuan A. Suarez Romero <jasuarez@igalia.com>
      Part-of: <!10666>
      99697035
  2. 29 Apr, 2021 1 commit
  3. 09 Dec, 2020 4 commits
  4. 25 Feb, 2020 1 commit
    • José María Casanova Crespo's avatar
      v3d: Sync on last CS when non-compute stage uses resource written by CS · 01496e3d
      José María Casanova Crespo authored
      
      
      When a resource is written by a compute shader and then used by a
      non-compute stage we sync on last compute job to guarantee that the
      resource has been completely written when the next stage reads resources.
      
      In the other cases how flushes are done guarantee the serialization of
      the writes and reads.
      
      To reproduce the failure the following tests should be executed in batch
      as last test don't fail when run isolated:
      
      KHR-GLES31.core.shader_image_load_store.basic-allFormats-load-fs
      KHR-GLES31.core.shader_image_load_store.basic-allFormats-loadStoreComputeStage
      KHR-GLES31.core.shader_image_load_store.basic-allTargets-load-cs
      KHR-GLES31.core.shader_image_load_store.advanced-sync-vertexArray
      
      v2: Use fence dep instead of bo_wait (Eric Anholt)
      v3: Rename struct names (Iago Toral)
          Document why is not needed on graphics->compute case. (Iago Toral)
          Follow same code pattern of the other update of in_sync_bcl.
      v4: Fixed comments style. (Iago Toral)
      
      Fixes KHR-GLES31.core.shader_image_load_store.advanced-sync-vertexArray
      Reviewed-by: Iago Toral's avatarIago Toral Quiroga <itoral@igalia.com>
      CC: 19.3 20.0 <mesa-stable@lists.freedesktop.org>
      Tested-by: Marge Bot <mesa/mesa!2700>
      Part-of: <mesa/mesa!2700>
      01496e3d
  5. 16 Dec, 2019 1 commit
    • Iago Toral's avatar
      v3d: fix primitive queries for geometry shaders · a1b7c084
      Iago Toral authored
      
      
      With geometry shaders the number of emitted primitived is decided
      at run time, so we cannot precompute it in the CPU and we need to
      use the PRIMITIVE_COUNTS_FEEDBACK commands to have the GPU provide
      the number like we do for the number of primitives written to
      transform feedback. This may have a performance impact though, since
      it requires a sync wait for the draw to complete, so we only do
      it when geometry shaders are present.
      
      v2: remove '> 0' comparison for ponter type (Alejandro)
      Reviewed-by: Alejandro Piñeiro's avatarAlejandro Piñeiro <apinheiro@igalia.com>
      a1b7c084
  6. 18 Oct, 2019 2 commits
    • Iago Toral's avatar
      v3d: request the kernel to flush caches when TMU is dirty · db874392
      Iago Toral authored
      
      
      This adapts the v3d driver to the new CL submit ioctl interface that
      allows the driver to request a flush of the caches after the render
      job has completed. This seems to eliminate the kernel write violation
      errors reported during CTS and Piglit excutions, fixing some CTS tests
      and GPU resets along the way.
      
      v2:
        - Adapt to changes in the kernel side.
        - Disable shader storage and shader images if the kernel doesn't
          implement cache flushing.
      
      Fixes CTS tests:
      KHR-GLES31.core.shader_image_size.basic-nonMS-fs-float
      KHR-GLES31.core.shader_image_size.basic-nonMS-fs-int
      KHR-GLES31.core.shader_image_size.basic-nonMS-fs-uint
      KHR-GLES31.core.shader_image_size.advanced-nonMS-fs-float
      KHR-GLES31.core.shader_image_size.advanced-nonMS-fs-int
      KHR-GLES31.core.shader_image_size.advanced-nonMS-fs-uint
      KHR-GLES31.core.shader_atomic_counters.advanced-usage-many-draw-calls2
      KHR-GLES31.core.shader_atomic_counters.advanced-usage-draw-update-draw
      KHR-GLES31.core.shader_storage_buffer_object.advanced-unsizedArrayLength-fs-int
      KHR-GLES31.core.shader_storage_buffer_object.advanced-unsizedArrayLength-fs-std140-matR
      KHR-GLES31.core.shader_storage_buffer_object.advanced-unsizedArrayLength-fs-std140-struct
      KHR-GLES31.core.shader_storage_buffer_object.advanced-unsizedArrayLength-fs-std430-matC-pad
      KHR-GLES31.core.shader_storage_buffer_object.advanced-unsizedArrayLength-fs-std430-vec
      Reviewed-by: Emma Anholt's avatarEric Anholt <eric@anholt.net>
      db874392
    • Emma Anholt's avatar
      v3d: Add Compute Shader support · 66e2d3b6
      Emma Anholt authored
      Now that the UAPI has landed, add the pipe_context function for
      dispatching compute shaders.  This is the last major feature for GLES 3.1,
      though it's not enabled quite yet.
      66e2d3b6
  7. 10 Oct, 2019 1 commit
  8. 13 Sep, 2019 2 commits
    • Iago Toral's avatar
      v3d: fix TF primitive counts for resume without draw · 2eace10c
      Iago Toral authored
      
      
      The V3D documentation states that primitive counters are reset when
      we emit Tile Binning Mode Configuration items, which we do at the start
      of each draw call, however, in the actual hardware this doesn't seem to
      take effect when transform feedback is not active (this doesn't happen in
      the simulator). This causes a problem in the following scenario:
      
      glBeginTransformFeedback()
         glDrawArrays()
         glPauseTransformFeedback()
         glDrawArrays()
         glResumeTransformFeedback()
      glEndTransformFeedback()
      
      The TF pause will trigger a flush of the primitive counters, which results
      in a correct number of primitives up to that point. In theory, the counter
      should then be reset when we execute the draw after pausing TF, but that
      doesn't happen, and since TF is enabled again by the resume command before
      we end recording, by the time we end the transform feedback recording we
      again check the counters, but instead of reading 0, we read again the same
      value we read at the time we paused, incorrectly accumulating that value
      again.
      
      In theory, we should be able to avoid this by using the other method to
      reset the primitive counters: using operation 1 instead of 0 when we
      flush the counts to the buffer at the time we pause, but again, this
      doesn't seem to be work and we still see obsolete counts by the time we
      end transform feedback.
      
      This patch fixes the problem by not accumulating TF primitive counts
      unless we know we have actually queued draw calls during transform
      feedback, since that seems to effectively reset the counters. This should
      also be more performant, since it saves unnecessary stalls for the
      primitive counters to be updated when we know there haven't been any
      new primitives drawn.
      
      Fixes CTS tests:
      dEQP-GLES3.functional.transform_feedback.*
      Reviewed-by: Emma Anholt's avatarEric Anholt <eric@anholt.net>
      2eace10c
    • Iago Toral's avatar
      b69f51a5
  9. 13 Aug, 2019 1 commit
  10. 08 Aug, 2019 1 commit
    • Iago Toral's avatar
      v3d: use the GPU to record primitives written to transform feedback · 0f2d1dfe
      Iago Toral authored
      
      
      We can use the PRIMITIVE_COUNTS_FEEDBACK packet to write various primitive
      counts to a buffer, including the number of primives written to transform
      feedback buffers, which will handle buffer overflow correctly.
      
      There are a couple of caveats with this:
      
      Primitive counters are reset when we emit a 'Tile Binning Mode Configuration'
      packet, which can happen in the middle of a primitives query, so we need to
      read the buffer when we submit a job and accumulate the counts in the context
      so we don't lose them.
      
      We also need to do the same when we switch primitive type during transform
      feedback so we can compute the correct number of recorded vertices from
      the number of primitives. This is necessary so we can provide an accurate
      vertex count for draw from transform feedback.
      
      v2:
       - When computing the number of vertices for a primitive, pass in the base
         primitive, since that is what the hardware will count.
       - No need to update primitive counts when switching primitive types if
         the base primitives are the same.
       - Log perf warning when mapping the primitive counts BO for readback (Eric).
       - Only emit the primitive counts packet once at job end (Eric).
       - Use u_upload mechanism for the primitive counts buffer (Eric).
       - Use the XML to generate indices into the primitive counters buffer (Eric).
      
      Fixes piglit tests:
      spec/ext_transform_feedback/overflow-edge-cases
      spec/ext_transform_feedback/query-primitives_written-bufferrange
      spec/ext_transform_feedback/query-primitives_written-bufferrange-discard
      spec/ext_transform_feedback/change-size base-shrink
      spec/ext_transform_feedback/change-size base-grow
      spec/ext_transform_feedback/change-size offset-shrink
      spec/ext_transform_feedback/change-size offset-grow
      spec/ext_transform_feedback/change-size range-shrink
      spec/ext_transform_feedback/change-size range-grow
      spec/ext_transform_feedback/intervening-read prims-written
      Reviewed-by: Emma Anholt's avatarEric Anholt <eric@anholt.net>
      0f2d1dfe
  11. 31 Jul, 2019 1 commit
  12. 30 Jul, 2019 1 commit
    • Alejandro Piñeiro's avatar
      v3d: take into account separate_stencil when checking if stencil should be cleared · cda4c628
      Alejandro Piñeiro authored
      In most cases this is not needed because the usual is that when a
      separate stencil is written, the parent resource is also written.
      
      This is needed if we have a separate stencil, no depth buffer, and the
      source and destination is the same, as in that case the stencil can be
      updated, but not the parent source (like if you are blitting only the
      stencil buffer). On that situation, the following access to the
      stencil buffer would clear the stencil buffer (so overwritting the
      previous blitting) cleared because the parent source has
      v3d_resource.writes to 0.
      
      As far as I see, that situation only happens with the
      GL_DEPTH32F_STENCIL8 format.
      
      Note that one alternative would consider that if the separate_stencil
      has been written, the parent should also be considered written (and
      update its "writes" field accordingly). But I found this patch more
      natural.
      
      Fixes the following piglit tests:
         spec/arb_depth_buffer_float/fbo-stencil-gl_depth32f_stencil8-blit
         spec/arb_depth_buffer_float/fbo-stencil-gl_depth32f_stencil8-copypixels
      
      the latter regressed when internally glCopyPixels implementation
      started to use blitting. So:
      
      Fixes: 131d40cf
      
       ("st/mesa: accelerate glCopyPixels(STENCIL)")
      Reviewed-by: Emma Anholt's avatarEric Anholt <eric@anholt.net>
      cda4c628
  13. 02 Jul, 2019 2 commits
    • Iago Toral's avatar
      v3d: do not flush jobs that are synced with 'Wait for transform feedback' · 042aeffd
      Iago Toral authored
      
      
      Generally, we achieve this by skipping the flush on calls to
      v3d_flush_jobs_writing_resource() when we detect that the resource is written
      in the current job from a transform feedback write.
      
      The exception to this is the case where the caller is about to map the
      resource, in which case we need to flush immediately since we can only emit
      'Wait for transform feedback' commands on rendering jobs. We add a parameter
      to the function so the caller can identify that scenario.
      Reviewed-by: Emma Anholt's avatarEric Anholt <eric@anholt.net>
      042aeffd
    • Iago Toral's avatar
      v3d: keep track of resources written by transform feedback · c7dff0e6
      Iago Toral authored
      
      
      The hardware provides a feature to sync reads from previous transform feedback
      writes in the same job so if we use this mechanism we no longer have to flush
      the job.
      
      In order to identify this scenario we need a mechanism to identify resources
      that are written by transform feedback.
      
      v2: use _mesa_pointer_set_create (Eric)
      Reviewed-by: Emma Anholt's avatarEric Anholt <eric@anholt.net>
      c7dff0e6
  14. 18 Jun, 2019 1 commit
  15. 26 Apr, 2019 1 commit
  16. 27 Jan, 2019 1 commit
  17. 14 Jan, 2019 1 commit
  18. 15 Dec, 2018 1 commit
  19. 02 Nov, 2018 1 commit
  20. 01 Nov, 2018 1 commit
  21. 25 Oct, 2018 1 commit
  22. 30 Jul, 2018 2 commits
  23. 28 Jul, 2018 2 commits
  24. 26 Jul, 2018 1 commit
    • Emma Anholt's avatar
      v3d: Rename cleared/resolve to clear/store. · 47f5d158
      Emma Anholt authored
      These describe what the fields mean in RCL generation.  "resolve" is left
      over from VC4, and sounds like MSAA resolves (which may or may not be
      involved in the store we generate).
      47f5d158
  25. 20 Jun, 2018 1 commit
    • Emma Anholt's avatar
      v3d: Track write reference to the separate stencil buffer. · 94f7c011
      Emma Anholt authored
      Otherwise, a blit from separate stencil may fail to flush the job that
      initialized it, or new drawing could fail to flush a blit reading from
      stencil.
      
      Fixes:
      dEQP-GLES3.functional.fbo.blit.depth_stencil.depth32f_stencil8_basic
      dEQP-GLES3.functional.fbo.blit.depth_stencil.depth32f_stencil8_scale
      dEQP-GLES3.functional.fbo.blit.depth_stencil.depth32f_stencil8_stencil_only
      dEQP-GLES3.functional.fbo.msaa.2_samples.depth32f_stencil8
      dEQP-GLES3.functional.fbo.msaa.4_samples.depth32f_stencil8
      94f7c011
  26. 16 May, 2018 4 commits
  27. 12 Apr, 2018 3 commits
    • Emma Anholt's avatar
      broadcom/vc5: Fix a stray '`' in a comment. · 7bc77dbb
      Emma Anholt authored
      7bc77dbb
    • Emma Anholt's avatar
      broadcom/vc5: Update the UABI for in/out syncobjs · b225cdce
      Emma Anholt authored
      This is the ABI I'm hoping to stabilize for merging the driver.  seqnos
      are eliminated, which allows for the GPU scheduler to task-switch between
      DRM fds even after submission to the kernel.  In/out sync objects are
      introduced, to allow the Android fencing extension (not yet implemented,
      but should be trivial), and to also allow the driver to tell the kernel to
      not start a bin until a previous render is complete.
      b225cdce
    • Emma Anholt's avatar
      broadcom/vc5: Drop the throttling code. · aedfd8ed
      Emma Anholt authored
      Since I'll be using the DRM scheduler, we won't run into the problem of a
      runaway client starving other clients of GPU time.
      aedfd8ed