1. 06 Sep, 2021 4 commits
  2. 18 Aug, 2021 1 commit
  3. 06 Aug, 2021 1 commit
  4. 05 Aug, 2021 1 commit
    • Connor Abbott's avatar
      tu, freedreno/a6xx, ir3: Rewrite tess PrimID handling · 8115cde3
      Connor Abbott authored
      The previous handling conflated RelPatchID and PrimID, which would
      result in incorrect gl_PrimitiveID when doing draw splitting and didn't
      work with PrimID passthrough which fills the VPC slot with the "correct"
      PrimID value from the tess factor BO which we left 0. Replace PrimID in
      the tess lowering pass with a new RelPatchID sysval, and relace PrimID
      with RelPatchID in the VS input code in turnip/freedreno at the same
      time so that there is no net change in the tess lowering code. However,
      now we have to add new mechanisms for getting the user-level PrimID:
      - In the TCS it comes from the VS, just like gl_PrimitiveIDIn in the GS.
        This means we have to add another register to our VS->TCS ABI. I
        decided to put PrimID in r0.z, after the TCS header and RelPatchID,
        because it might not be read in the TCS.
      - If any stage after the TCS uses PrimID, the TCS stores it in the first
        dword of the tess factor BO, and it is read by the fixed-function
        tessellator and accessed in the TES via the newly-uncovered DSPRIMID
        field. If we have tess and GS, the TES passes this value through to
        the GS in the same way as the VS does. PrimID passthrough for reading
        it in the FS when there's tess but no GS also "just works" once we
        start storing it in the TCS. In particular this fixes
        dEQP-VK.pipeline.misc.primitive_id_from_tess which tests exactly that.
      Part-of: <mesa/mesa!12166>
  5. 12 Jul, 2021 2 commits
    • Connor Abbott's avatar
      ir3: Reformat source with clang-format · 177138d8
      Connor Abbott authored
      Generated using:
      cd src/freedreno/ir3 && clang-format -i {**,.}/*.c {**,.}/*.h -style=file
      Part-of: <mesa/mesa!11801>
    • Connor Abbott's avatar
      ir3: Manually reformat some places · 2e76f7b6
      Connor Abbott authored
      clang-format does a bad job with a few tables and macros, and there were
      some places it was doing wonky things because comments were longer than
      80 characters and it tries to fix that without reformatting the comment
      itself. Add magic comments to tell it to turn itself off and retab those
      places manually (well, with a regex!).
      Part-of: <mesa/mesa!11801>
  6. 08 Jul, 2021 1 commit
    • Connor Abbott's avatar
      tu, ir3: Plumb through support for CS subgroup size/id · 68b8b9e9
      Connor Abbott authored
      The way that the blob obtains the subgroup id on compute shaders is by
      just and'ing gl_LocalInvocationIndex with 63, since it advertizes a
      subgroupSize of 64. In order to support VK_EXT_subgroup_size_control and
      expose a subgroupSize of 128, we'll have to do something a little more
      flexible. Sometimes we have to fall back to a subgroup size of 64 due to
      various constraints, and in that case we have to fake a subgroup size of
      128 while actually using 64 under the hood, by just pretending that the
      upper 64 invocations are all disabled. However when computing the
      subgroup id we need to use the "real" subgroup size. For this purpose we
      plumb through a driver param which exposes the real subgroup size. If
      the user forces a particular subgroup size then we lower
      load_subgroup_size in nir_lower_subgroups, otherwise we let it through,
      and we assume when translating to ir3 that load_subgroup_size means
      "give me the *actual* subgroup size that you decided in RA" and give you
      the driver param.
      Part-of: <mesa/mesa!6752>
  7. 21 Apr, 2021 1 commit
    • Danylo Piliaiev's avatar
      ir3: make possible to specify branchstack up to 64 · 9402d5a6
      Danylo Piliaiev authored
      On a6xx/a5xx there is such dependency between branchstack bitfield
      and the amount of nested ifs, which could be seen with blob:
      0	0
      1	1
      2	2
      3	2
      4	3
      5	3
      6	4
      59	30
      60	31
      61	31
      62	32
      63	32
      64	32
      Remove open-coded branchstack for a5xx compute along the way.
      Fixes tests:
      Signed-off-by: Danylo Piliaiev's avatarDanylo Piliaiev <dpiliaiev@igalia.com>
      Part-of: <mesa/mesa!9859>
  8. 07 Apr, 2021 1 commit
  9. 29 Mar, 2021 1 commit
  10. 25 Mar, 2021 1 commit
  11. 22 Mar, 2021 1 commit
    • Connor Abbott's avatar
      freedreno: Add local_size to ir3_shader_variant · cbc68c79
      Connor Abbott authored
      We want to use the local_size when available to calculate the threadsize
      in ir3, and we need it to work with e.g. computerator where we don't
      have a nir shader. Add a local_size field and use that in computerator
      instead of of a separate structure that's inaccessable to core ir3.
      Also set a dummy local_size in the tests to avoid a divide-by-zero.
      Part-of: <mesa/mesa!9498>
  12. 20 Mar, 2021 1 commit
  13. 03 Mar, 2021 1 commit
  14. 25 Feb, 2021 1 commit
  15. 24 Feb, 2021 2 commits
  16. 19 Feb, 2021 1 commit
    • Danylo Piliaiev's avatar
      turnip,freedreno/a6xx: tell hw the size of shared mem used by CS · 0fa7ec14
      Danylo Piliaiev authored
      Before, we only used 2k of shared memory.
      It was found that 5 lower bits of SP_CS_UNKNOWN_A9B1 do control
      the available size of shared memory for compute shaders, with
      up to 32k. And SP_CS_UNKNOWN_A9B1_SHARED_SIZE being zero enables
      all 32k of shared memory.
      Fixes tests:
      Signed-off-by: Danylo Piliaiev's avatarDanylo Piliaiev <dpiliaiev@igalia.com>
      Part-of: <mesa/mesa!9157>
  17. 05 Jan, 2021 1 commit
  18. 21 Dec, 2020 1 commit
  19. 19 Nov, 2020 1 commit
  20. 16 Nov, 2020 1 commit
    • Emma Anholt's avatar
      freedreno+turnip: Upload large shader constants as a UBO. · 1f440533
      Emma Anholt authored
      Right now if the shader indirects on some large constant array, we see NIR
      load_consts (usually from the const file) of its contents into general
      registers, then indirection on the GPRs.  This often results in register
      allocation failures, as it's easy to go beyond the ~256 dwords of
      registers per invocation.
      By moving the large constants to a UBO, we can load an arbitrary number of
      them.  They also can be theoretically moved to the constant reg file (~2k
      dwords), though you're unlikely to hit this path without an indirect load
      on your large constant, and we don't yet let UBO indirect loads get moved
      to constant regs.
      This possibly won't work out right if we have 16-bit load_constants, but
      without other MRs in flight we won't see 16-bit temps to be lowered to
      This allows 2 kerbal-space-program shaders to compile that previously
      would fail, and fixes the new dEQP-VK and -GLES2 tests I wrote that
      dynamically index a 40-element temporary array of float/vec2/vec3/vec4
      with constant element initializers.
      Closes: mesa/mesa#2789
      Part-of: <mesa/mesa!5810>
  21. 03 Nov, 2020 1 commit
  22. 23 Oct, 2020 2 commits
    • Connor Abbott's avatar
      ir3: Handle clip+cull distances · 47f825ac
      Connor Abbott authored
      Part-of: <mesa/mesa!6959>
    • Connor Abbott's avatar
      ir3: Switch tess lowering to use location · 9e063b01
      Connor Abbott authored
      Clip & cull distances, which are compact arrays, exposed a lot of holes
      because they can take up multiple slots and partially overlap.
      I wanted to eliminate our dependence on knowing the layout of the
      variables, as this can get complicated with things like partially
      overlapping arrays, which can happen with ARB_enhanced_layouts or with
      clip/cull distance arrays. This means no longer changing the layout
      based on whether the i/o is part of an array or not, and no longer
      matching producer <-> consumer based on the variables. At the end of the
      day we have to match things based on the user-specified location, so for
      simplicity this switches the entire i/o handling to be based off the
      user location rather than the driver location. This means that the
      primitive map may be a little bigger, but it reduces the complexity
      because we never have to build a table mapping user location to driver
      location, and it reduces the amount of work done at link time in the SSO
      case. It also brings us closer to what the other drivers do.
      While here, I also fixed the handling of component qualifiers, which was
      another thing broken with clip/cull distances.
      Part-of: <mesa/mesa!6959>
  23. 29 Sep, 2020 1 commit
  24. 22 Sep, 2020 1 commit
  25. 15 Sep, 2020 1 commit
  26. 01 Sep, 2020 1 commit
  27. 20 Aug, 2020 1 commit
  28. 05 Aug, 2020 2 commits
  29. 16 Jul, 2020 1 commit
  30. 07 Jul, 2020 1 commit
  31. 01 Jul, 2020 1 commit
  32. 26 Jun, 2020 2 commits