1. 04 Feb, 2014 1 commit
  2. 03 Feb, 2014 3 commits
    • Rob Clark's avatar
      freedreno: enabling binning and opt by default · 1b886078
      Rob Clark authored
      Hw binning pass doesn't seem to have broken anything.  And optimizing
      compiler fixes a lot of shaders and doesn't seem to break anything.  So
      re-org slightly FD_MESA_DEBUG params and make both hw binning and
      optimizer enabled by default.
      Signed-off-by: default avatarRob Clark <robclark@freedesktop.org>
      1b886078
    • Rob Clark's avatar
      freedreno/a3xx/compiler: new compiler · 554f1ac0
      Rob Clark authored
      The new compiler generates a dependency graph of instructions, including
      a few meta-instructions to handle PHI and preserve some extra
      information needed for register assignment, etc.
      
      The depth pass assigned a weight/depth to each node (based on sum of
      instruction cycles of a given node and all it's dependent nodes), which
      is used to schedule instructions.  The scheduling takes into account the
      minimum number of cycles/slots between dependent instructions, etc.
      Which was something that could not be handled properly with the original
      compiler (which was more of a naive TGSI translator than an actual
      compiler).
      
      The register assignment is currently split out as a standalone pass.  I
      expect that it will be replaced at some point, once I figure out what to
      do about relative addressing (which is currently the only thing that
      should cause fallback to old compiler).
      
      There are a couple new debug options for FD_MESA_DEBUG env var:
      
        optmsgs - enable debug prints in optimizer
        optdump - dump instruction graph in .dot format, for example:
      
      http://people.freedesktop.org/~robclark/a3xx/frag-0000.dot.png
      http://people.freedesktop.org/~robclark/a3xx/frag-0000.dot
      
      At this point, thanks to proper handling of instruction scheduling, the
      new compiler fixes a lot of things that were broken before, and does not
      appear to break anything that was working before[1].  So even though it
      is not finished, it seems useful to merge it in it's current state.
      
      [1] Not merged in this commit, because I'm not sure if it really belongs
      in mesa tree, but the following commit implements a simple shader
      emulator, which I've used to compare the output of the new compiler to
      the original compiler (ie. run it on all the TGSI shaders dumped out via
      ST_DEBUG=tgsi with various games/apps):
      
      https://github.com/freedreno/mesa/commit/163b6306b1660e05ece2f00d264a8393d99b6f12Signed-off-by: default avatarRob Clark <robclark@freedesktop.org>
      554f1ac0
    • Rob Clark's avatar
      freedreno/a3xx/compiler: split out old compiler · f0e2d7ab
      Rob Clark authored
      For the time being, keep old compiler as fallback for things that the
      new compiler does not support yet.  Split out as it's own commit to make
      the later new-compiler commits easier to follow.
      Signed-off-by: default avatarRob Clark <robclark@freedesktop.org>
      f0e2d7ab
  3. 01 Feb, 2014 1 commit
  4. 29 Jan, 2014 1 commit
  5. 23 Jan, 2014 1 commit
  6. 08 Jan, 2014 2 commits
    • Rob Clark's avatar
      freedreno: add basic query support · 646c16af
      Rob Clark authored
      Add for now some simple/basic query support (ie. things not actually
      requiring the GPU).  Might change around a bit when I actually add
      GPU queries, but for now this enables some useful performance info
      in the GALLIUM_HUD.  For example:
      
        GALLIUM_HUD=fps+batches+batches-sysmem+batches-gmem+restores,draw-calls
      
      The driver specific specific queries are:
      
        + draw-calls
        + batches - number of batches per second, sum of batches-sysmem
          plus batches-gmem
        + batches-gmem - render a set of tiles in GMEM, for each tile
          (optionally) system mem -> gmem (restore), plus N draws,
          plus gmem -> system mem (resolve) per second
        + batches-sysmem - N draws to system memory (GMEM bypass) per
          second
        + restores - number of GMEM batches that required restore per
          second
      
      Ideally for GMEM rendering, you want batches-gmem to equal fps.  If
      the app is doing something that triggers multiple passes (ie. requires
      extra round trip gmem <-> system memory) then the # of batches per
      second will go up relative to fps.
      Signed-off-by: default avatarRob Clark <robclark@freedesktop.org>
      646c16af
    • Rob Clark's avatar
      freedreno/a3xx: support for hw binning pass · c0766528
      Rob Clark authored
      The binning pass sorts vertices into which bins/tiles they apply to.
      The visibility information generated during the binning pass can be
      used to speed up the rendering pass by filtering out vertices which
      do not apply to the current tile.  See:
      
       https://github.com/freedreno/freedreno/wiki/Adreno-tiling#optimized-approach
      
      This brings a significant fps boost.  A rough assortment of tests
      (supertuxkart, etracer, tremulous, glmark2 'build' test, etc) seems
      to yield a ~35-45% fps improvement.
      
      For now, to be conservative, the binning pass is not enabled yet by
      default.  To enable it use:
      
        FD_MESA_DEBUG=binning
      
      So far I haven't found anything that breaks with binning enabled,
      but I'd like a bit more testing before I enable it as default.
      Signed-off-by: default avatarRob Clark <robclark@freedesktop.org>
      c0766528
  7. 10 Dec, 2013 1 commit
  8. 07 Dec, 2013 1 commit
  9. 03 Dec, 2013 1 commit
  10. 28 Nov, 2013 1 commit
  11. 02 Nov, 2013 2 commits
    • Rob Clark's avatar
      freedreno/a3xx/compiler: highp frag shader · a53fe222
      Rob Clark authored
      Fixes use of full-precision in fragment shader (ie. don't clobber r0.x
      since that can be used by future bary instructions for varying fetch).
      And makes use of full-precision the default in fragment shader (but can
      be overriden via FD_MESA_DEBUG=fraghalf).
      
      Seems like half precision is often not enough for texture coordinates.
      The blob compiler is clever enough to keep texture coords in full
      precision registers while using half precision for everything else.  But
      we aren't quite that clever yet, so better to default to full precision.
      Signed-off-by: default avatarRob Clark <robclark@freedesktop.org>
      a53fe222
    • Rob Clark's avatar
      freedreno: we do actually support sqrt · 4ddd4e83
      Rob Clark authored
      Signed-off-by: default avatarRob Clark <robclark@freedesktop.org>
      4ddd4e83
  12. 25 Oct, 2013 2 commits
    • Ilia Mirkin's avatar
      gallium: add PIPE_CAP_MIXED_FRAMEBUFFER_SIZES · 12d39b4f
      Ilia Mirkin authored
      This CAP will determine whether ARB_framebuffer_object can be enabled.
      The nv30 driver does not allow mixing swizzled and linear zsbuf/cbuf
      textures.
      Signed-off-by: Ilia Mirkin's avatarIlia Mirkin <imirkin@alum.mit.edu>
      Signed-off-by: default avatarMarek Olšák <marek.olsak@amd.com>
      12d39b4f
    • Rob Clark's avatar
      freedreno/a3xx: fix const/rel/const-rel encoding · 4317c4e6
      Rob Clark authored
      The encoding of constant, relative, and relative-const src registers is
      a bit more complex than originally thought, which gives an extra bit to
      encode const reg # at expense of taking a bit from relative offset.
      
      In most cases a3xx seems to actually use a scheme whereby it can encode
      an extra bit for const register.  You have three possible encodings in
      thirteen bits:
      
         register:  (11 bits for N.c)
           00........... rN.c
      
         relative:  (10 bits for N)
           010.......... r<a0.x + N>
           011.......... c<a0.x + N>
      
         const:     (12 bits for N.c)
           1............ cN.c
      
      Which means we can deal w/ more consts than previously thought.
      Signed-off-by: default avatarRob Clark <robclark@freedesktop.org>
      4317c4e6
  13. 14 Sep, 2013 3 commits
  14. 24 Aug, 2013 1 commit
  15. 22 Jul, 2013 1 commit
    • Tom Stellard's avatar
      gallium: Add PIPE_CAP_ENDIANNESS · 4e90bc9a
      Tom Stellard authored
      Cc: mesa-stable@lists.freedesktop.org
      [ Francisco Jerez: Fix "PIPE_ENDIAN_SMALL" in the documentation,
        define PIPE_ENDIAN_NATIVE. ]
      4e90bc9a
  16. 02 Jul, 2013 1 commit
  17. 08 Jun, 2013 2 commits
    • Rob Clark's avatar
      freedreno: add a3xx support · 2855f3f7
      Rob Clark authored
      The adreno a3xx GPU is found in newer snapdragon devices, such as the
      nexus4.  The a3xx is GLESv3 and OpenCL capable, although that is not
      enabled yet in gallium.
      
      Compared to a2xx, it introduces an entirely new unified shader ISA, and
      re-shuffles all or nearly all of the registers.  The good news is that
      (for the most part) the registers are more orthogonal, not combining
      unrelated state in a single register.  And that there is a lot more
      flexibility, so we don't need to patch and re-emit the shader like we
      did on a2xx.
      
      The shader compiler is currently quite dumb, there would be a lot of
      room for improvement with an optimizing pass.  Despite that, with the
      a320 in my nexus4 it seems to be ~2-3x faster compared to the a220 in my
      HP touchpad.
      Signed-off-by: default avatarRob Clark <robclark@freedesktop.org>
      2855f3f7
    • Rob Clark's avatar
      freedreno: prepare for a3xx · 18c317b2
      Rob Clark authored
      Split the parts that are specific to adreno a2xx series GPUs from the
      parts that will be in common with a3xx, so that a3xx support can be
      added more cleanly.
      Signed-off-by: default avatarRob Clark <robclark@freedesktop.org>
      18c317b2
  18. 25 Apr, 2013 2 commits
  19. 18 Apr, 2013 1 commit
    • Christoph Bumiller's avatar
      st/mesa: optionally apply texture swizzle to border color v2 · 729abfd0
      Christoph Bumiller authored
      This is the only sane solution for nv50 and nvc0 (really, trust me),
      but since on other hardware the border colour is tightly coupled with
      texture state they'd have to undo the swizzle, so I've added a cap.
      
      The dependency of update_sampler on the texture updates was
      introduced to avoid doing the apply_depthmode to the swizzle twice.
      
      v2: Moved swizzling helper to u_format.c, extended the CAP to
      provide more accurate information.
      729abfd0
  20. 03 Apr, 2013 1 commit
  21. 25 Mar, 2013 2 commits
  22. 20 Mar, 2013 1 commit
    • Christoph Bumiller's avatar
      gallium: add TGSI_SEMANTIC_TEXCOORD,PCOORD v3 · 8acaf862
      Christoph Bumiller authored
      This makes it possible to identify gl_TexCoord and gl_PointCoord
      for drivers where sprite coordinate replacement is restricted.
      
      The new PIPE_CAP_TGSI_TEXCOORD decides whether these varyings
      should be hidden behind the GENERIC semantic or not.
      
      With this patch only nvc0 and nv30 will request that they be used.
      
      v2: introduce a CAP so other drivers don't have to bother with
      the new semantic
      
      v3: adapt to introduction gl_varying_slot enum
      8acaf862
  23. 12 Mar, 2013 1 commit
    • Rob Clark's avatar
      freedreno: gallium driver for adreno · 6173cc19
      Rob Clark authored
      Currently works on a220.  Others in the a2xx family look pretty similar
      and should be pretty straightforward to support with the same driver.
      
      The a3xx has a new shader ISA, and while many registers appear similar,
      the register addresses have been completely shuffled around.  I am not
      sure yet whether it is best to support with the same driver, but
      different compiler, or whether it should be split into a different
      driver.
      
      v1: original
      v2: build file updates from review comments, and remove GPL licensed
          header files from msm kernel
      v3: smarter temp/pred register assignment, fix clear and depth/stencil
          format issues, resource_transfer fixes, scissor fixes
      Signed-off-by: Rob Clark's avatarRob Clark <robdclark@gmail.com>
      6173cc19