Skip to content
Snippets Groups Projects
  1. May 01, 2018
    • Lina Versace's avatar
      CHROMIUM: i965: Implement EGL_KHR_mutable_render_buffer · 3302281f
      Lina Versace authored
      Tested with a low-latency handwriting application on Android Nougat on
      the Chrome OS Pixelbook (codename Eve) with Kabylake.
      
      BUG=b:77899911
      TEST=No android-cts-7.1 regressions on Eve.
      
      Change-Id: Ia816fa6b0a1158f81e5b63477451bf337c2001aa
    • Lina Versace's avatar
      CHROMIUM: egl/android: Implement EGL_KHR_mutable_render_buffer · 54f07a7e
      Lina Versace authored
      Specifically, implement the extension DRI_MutableRenderBufferLoader.
      However, the loader enables EGL_KHR_mutable_render_buffer only if the
      DRI driver implements its half of the extension,
      DRI_MutableRenderBufferDriver.
      
      BUG=b:77899911
      TEST=No android-cts-7.1 regressions on Eve.
      
      Change-Id: I7fe68a5a674d1707b1e7251d900b3affd5dd7660
      54f07a7e
    • Lina Versace's avatar
      CHROMIUM: egl/main: Add bits for EGL_KHR_mutable_render_buffer · bf85c6b1
      Lina Versace authored
      A follow-up patch enables EGL_KHR_mutable_render_buffer for Android.
      This patch is separate from the Android patch because I think it's
      easier to review the platform-independent bits separately.
      
      BUG=b:77899911
      TEST=No android-cts-7.1 regressions on Eve.
      
      Change-Id: I07470f2862796611b141f69f47f935b97b0e04a1
      bf85c6b1
    • Lina Versace's avatar
      CHROMIUM: dri: Add param driCreateConfigs(mutable_render_buffer) · 272fd36b
      Lina Versace authored
      If set, then the config will have __DRI_ATTRIB_MUTABLE_RENDER_BUFFER,
      which translates to EGL_MUTABLE_RENDER_BUFFER_BIT_KHR.
      
      Not used yet.
      
      BUG=b:77899911
      TEST=No android-cts-7.1 regressions on Eve.
      
      Change-Id: Icdf35794f3e9adf31e1f85740b87ce155efe1491
      272fd36b
    • Lina Versace's avatar
      CHROMIUM: dri: Define DRI_MutableRenderBuffer extensions · 2aaeab9f
      Lina Versace authored
      Define extensions DRI_MutableRenderBufferDriver and
      DRI_MutableRenderBufferLoader. These are the two halves for
      EGL_KHR_mutable_render_buffer.
      
      Outside the DRI code there is one additional change.  Add
      gl_config::mutableRenderBuffer to match
      __DRI_ATTRIB_MUTABLE_RENDER_BUFFER. Neither are used yet.
      
      BUG=b:77899911
      TEST=No android-cts-7.1 regressions on Eve.
      
      Change-Id: I4ca03d81e4557380b19c44d8d799a7cc9365d928
      2aaeab9f
    • Lina Versace's avatar
      CHROMIUM: egl/dri2: In dri2_make_current, return early on failure · 0d7eae58
      Lina Versace authored
      This pulls an 'else' block into the function's main body, making the
      code easier to follow.
      
      Without this change, the upcoming EGL_KHR_mutable_render_buffer patch
      transforms dri2_make_current() into spaghetti.
      
      BUG=b:77899911
      TEST=No android-cts-7.1 regressions on Eve.
      
      Change-Id: I26be2b7a8e78a162dcd867a44f62d6f48b9a8e4d
      0d7eae58
    • Lina Versace's avatar
      CHROMIUM: egl: Drop _EGLContext::WindowRenderBuffer · 3e8d93e1
      Lina Versace authored
      Replace it with two fields in _EGLSurface, RequestedRenderBuffer and
      ActiveRenderBuffer. (_EGLSurface::RequestedRenderBuffer replaces
      _EGLSurface::RenderBuffer).
      
      There exist *two* queryable EGL_RENDER_BUFFER states in EGL:
      eglQuerySurface(EGL_RENDER_BUFFER) and
      eglQueryContext(EGL_RENDER_BUFFER). _EGLContext::WindowRenderBuffer was
      related to eglQueryContext but not eglQuerySurface. Post-patch,
      RequestedRenderBuffer is related to eglQuerySurface and
      ActiveRenderBuffer is related to eglQueryContext.
      
      The implementation of eglQuerySurface(EGL_RENDER_BUFFER) contained
      abstruse logic which required comprehending the specification
      complexities of how the two EGL_RENDER_BUFFER states interact. Sometimes
      it returned _EGLContext::WindowRenderBuffer, sometimes
      _EGLSurface::RenderBuffer. Why? The function tried to encode the actual
      logic in the EGL spec. When did the function return which variable? Go
      study the EGL spec, hope you understand it, then hope Mesa mutated the
      EGL_RENDER_BUFFER state in all the correct places. Have fun.
      
      I got a headache from the mental gymnastics.
      
      To simplify eglQuerySurface(EGL_RENDER_BUFFER), and to improve
      confidence in its correctness, flatten its indirect logic. For pixmap
      and pbuffer surfaces, return a hard-coded literal value, as the spec
      suggests. For window surfaces, simply return ActiveRenderBuffer.
      Nothing difficult here.
      
      These changes eliminate potentially very fragile code in the upcoming
      EGL_KHR_mutable_render_buffer implementation.
      
      BUG=b:77899911
      TEST=No android-cts-7.1 regressions on Eve.
      
      Change-Id: Ic5f2ab1952f26a87081bc4f78bc7fa96734c8f2a
      3e8d93e1
  2. Apr 17, 2018
  3. Apr 12, 2018
  4. Apr 11, 2018
  5. Mar 29, 2018
  6. Mar 14, 2018
  7. Mar 13, 2018
  8. Mar 08, 2018
  9. Mar 07, 2018
  10. Mar 06, 2018
  11. Feb 23, 2018
  12. Feb 17, 2018
  13. Feb 03, 2018
  14. Jan 31, 2018
  15. Jan 18, 2018
    • Francisco Jerez's avatar
      FROMLIST: intel/fs: Optimize and simplify the copy propagation dataflow logic. · 68a70770
      Francisco Jerez authored
      Previously the dataflow propagation algorithm would calculate the ACP
      live-in and -out sets in a two-pass fixed-point algorithm.  The first
      pass would update the live-out sets of all basic blocks of the program
      based on their live-in sets, while the second pass would update the
      live-in sets based on the live-out sets.  This is incredibly
      inefficient in the typical case where the CFG of the program is
      approximately acyclic, because it can take up to 2*n passes for an ACP
      entry introduced at the top of the program to reach the bottom (where
      n is the number of basic blocks in the program), until which point the
      algorithm won't be able to reach a fixed point.
      
      The same effect can be achieved in a single pass by computing the
      live-in and -out sets in lock-step, because that makes sure that
      processing of any basic block will pick up the updated live-out sets
      of the lexically preceding blocks.  This gives the dataflow
      propagation algorithm effectively O(n) run-time instead of O(n^2) in
      the acyclic case.
      
      The time spent in dataflow propagation is reduced by 30x in the
      GLES31.functional.ssbo.layout.random.all_shared_buffer.5 dEQP
      test-case on my CHV system (the improvement is likely to be of the
      same order of magnitude on other platforms).  This more than reverses
      an apparent run-time regression in this test-case from my previous
      copy-propagation undefined-value handling patch, which was ultimately
      caused by the additional work introduced in that commit to account for
      undefined values being multiplied by a huge quadratic factor.
      
      According to Chad this test was failing on CHV due to a 30s time-out
      imposed by the Android CTS (this was the case regardless of my
      undefined-value handling patch, even though my patch substantially
      exacerbated the issue).  On my CHV system this patch reduces the
      overall run-time of the test by approximately 12x, getting us to
      around 13s, well below the time-out.
      
      v2: Initialize live-out set to the universal set to avoid rather
          pessimistic dataflow estimation in shaders with cycles (Addresses
          performance regression reported by Eero in GpuTest Piano).
          Performance numbers given above still apply.  No shader-db changes
          with respect to master.
      
      Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104271
      
      
      Reported-by: default avatarChad Versace <chadversary@chromium.org>
      Archived-At: https://lists.freedesktop.org/archives/mesa-dev/2017-December/180489.html
      (am from https://patchwork.freedesktop.org/patch/194420/)
      
      BUG=b:67394445
      TEST=No regressions in Android CTS, GLES tests.
        Fixes timeouts in dEQP-GLES31.functional.ssbo.layout.random.all_shared_buffer.5
        on Brasswell boards.
      
      Change-Id: I0d666c23693246b8d4fe8988f228f8c4ed7425f6
      Reviewed-on: https://chromium-review.googlesource.com/862007
      
      
      Reviewed-by: default avatarStéphane Marchesin <marcheu@chromium.org>
      Reviewed-by: default avatarIlja H. Friedel <ihf@chromium.org>
      Tested-by: default avatarIlja H. Friedel <ihf@chromium.org>
      68a70770
  16. Jan 17, 2018
    • Francisco Jerez's avatar
      UPSTREAM: intel/cfg: Represent divergent control flow paths caused by non-uniform loop execution. · 3519cdfc
      Francisco Jerez authored
      
      This addresses a long-standing back-end compiler bug that could lead
      to cross-channel data corruption in loops executed non-uniformly.  In
      some cases live variables extending through a loop divergence point
      (e.g. a non-uniform break) into a convergence point (e.g. the end of
      the loop) wouldn't be considered live along all physical control flow
      paths the SIMD thread could possibly have taken in between due to some
      channels remaining in the loop for additional iterations.
      
      This patch fixes the problem by extending the CFG with physical edges
      that don't exist in the idealized non-vectorized program, but
      represent valid control flow paths the SIMD EU may take due to the
      divergence of logical threads.  This makes sense because the i965 IR
      is explicitly SIMD, and it's not uncommon for instructions to have an
      influence on neighboring channels (e.g. a force_writemask_all header
      setup), so the behavior of the SIMD thread as a whole needs to be
      considered.
      
      No changes in shader-db.
      
      Reviewed-by: default avatarJason Ekstrand <jason@jlekstrand.net>
      Reviewed-by: Kenneth Graunke's avatarKenneth Graunke <kenneth@whitecape.org>
      (cherry picked from commit 4d1959e6)
      
      This patch is a prerequisite for 4cbe48f5 "intel/fs: Optimize and
      simplify the copy propagation dataflow logic".
      
      BUG=b:67394445
      TEST=No regressions in Android CTS, GLES tests.
      
      Change-Id: I949f6f4e0127fec93d890e7669f870872f097a58
      Reviewed-on: https://chromium-review.googlesource.com/862006
      
      
      Reviewed-by: default avatarStéphane Marchesin <marcheu@chromium.org>
      Reviewed-by: default avatarIlja H. Friedel <ihf@chromium.org>
      Tested-by: default avatarIlja H. Friedel <ihf@chromium.org>
      Commit-Queue: Chad Versace <chadversary@chromium.org>
      3519cdfc
    • Francisco Jerez's avatar
      UPSTREAM: intel/fs: Don't let undefined values prevent copy propagation. · a058539d
      Francisco Jerez authored
      
      This makes the dataflow propagation logic of the copy propagation pass
      more intelligent in cases where the destination of a copy is known to
      be undefined for some incoming CFG edges, building upon the
      definedness information provided by the last patch.  Helps a few
      programs, and avoids a handful shader-db regressions from the next
      patch.
      
      shader-db results on ILK:
      
        total instructions in shared programs: 6541547 -> 6541523 (-0.00%)
        instructions in affected programs: 360 -> 336 (-6.67%)
        helped: 8
        HURT: 0
      
        LOST:   0
        GAINED: 10
      
      shader-db results on BDW:
      
        total instructions in shared programs: 8174323 -> 8173882 (-0.01%)
        instructions in affected programs: 7730 -> 7289 (-5.71%)
        helped: 5
        HURT: 2
      
        LOST:   0
        GAINED: 4
      
      shader-db results on SKL:
      
        total instructions in shared programs: 8185669 -> 8184598 (-0.01%)
        instructions in affected programs: 10364 -> 9293 (-10.33%)
        helped: 5
        HURT: 2
      
        LOST:   0
        GAINED: 2
      
      Reviewed-by: default avatarJason Ekstrand <jason@jlekstrand.net>
      (cherry picked from commit 9355116b)
      
      This patch is a prerequisite for 4cbe48f5 "intel/fs: Optimize and
      simplify the copy propagation dataflow logic".
      
      BUG=b:67394445
      TEST=No regressions in Android CTS, GLES tests.
      
      Change-Id: I8719e67ac14d3db8a7d6989d127ca4222cbdbfe4
      Reviewed-on: https://chromium-review.googlesource.com/862005
      
      
      Reviewed-by: default avatarStéphane Marchesin <marcheu@chromium.org>
      Reviewed-by: default avatarIlja H. Friedel <ihf@chromium.org>
      Tested-by: default avatarIlja H. Friedel <ihf@chromium.org>
      Commit-Queue: Chad Versace <chadversary@chromium.org>
      a058539d
    • Francisco Jerez's avatar
      UPSTREAM: intel/fs: Restrict live intervals to the subset possibly reachable from any definition. · 6efb3d85
      Francisco Jerez authored
      
      Currently the liveness analysis pass would extend a live interval up
      to the top of the program when no unconditional and complete
      definition of the variable is found that dominates all of its uses.
      
      This can lead to a serious performance problem in shaders containing
      many partial writes, like scalar arithmetic, FP64 and soon FP16
      operations.  The number of oversize live intervals in such workloads
      can cause the compilation time of the shader to explode because of the
      worse than quadratic behavior of the register allocator and scheduler
      when running out of registers, and it can also cause the running time
      of the shader to explode due to the amount of spilling it leads to,
      which is orders of magnitude slower than GRF memory.
      
      This patch fixes it by computing the intersection of our current live
      intervals with the subset of the program that can possibly be reached
      from any definition of the variable.  Extending the storage allocation
      of the variable beyond that is pretty useless because its value is
      guaranteed to be undefined at a point that cannot be reached from any
      definition.
      
      According to Jason, this improves performance of the subgroup Vulkan
      CTS tests significantly (e.g. the runtime of the dvec4 broadcast test
      improves by nearly 50x).
      
      No significant change in the running time of shader-db (with 5%
      statistical significance).
      
      shader-db results on IVB:
      
        total cycles in shared programs: 61108780 -> 60932856 (-0.29%)
        cycles in affected programs: 16335482 -> 16159558 (-1.08%)
        helped: 5121
        HURT: 4347
      
        total spills in shared programs: 1309 -> 1288 (-1.60%)
        spills in affected programs: 249 -> 228 (-8.43%)
        helped: 3
        HURT: 0
      
        total fills in shared programs: 1652 -> 1597 (-3.33%)
        fills in affected programs: 262 -> 207 (-20.99%)
        helped: 4
        HURT: 0
      
        LOST:   2
        GAINED: 209
      
      shader-db results on BDW:
      
        total cycles in shared programs: 67617262 -> 67361220 (-0.38%)
        cycles in affected programs: 23397142 -> 23141100 (-1.09%)
        helped: 8045
        HURT: 6488
      
        total spills in shared programs: 1456 -> 1252 (-14.01%)
        spills in affected programs: 465 -> 261 (-43.87%)
        helped: 3
        HURT: 0
      
        total fills in shared programs: 1720 -> 1465 (-14.83%)
        fills in affected programs: 471 -> 216 (-54.14%)
        helped: 4
        HURT: 0
      
        LOST:   2
        GAINED: 162
      
      shader-db results on SKL:
      
        total cycles in shared programs: 65436248 -> 65245186 (-0.29%)
        cycles in affected programs: 22560936 -> 22369874 (-0.85%)
        helped: 8457
        HURT: 6247
      
        total spills in shared programs: 437 -> 437 (0.00%)
        spills in affected programs: 0 -> 0
        helped: 0
        HURT: 0
      
        total fills in shared programs: 870 -> 854 (-1.84%)
        fills in affected programs: 16 -> 0
        helped: 1
        HURT: 0
      
        LOST:   0
        GAINED: 107
      
      Reviewed-by: default avatarJason Ekstrand <jason@jlekstrand.net>
      (cherry picked from commit c3c1aa5a)
      
      This patch is a prerequisite for 4cbe48f5 "intel/fs: Optimize and
      simplify the copy propagation dataflow logic".
      
      BUG=b:67394445
      TEST=No regressions in Android CTS, GLES tests.
      
      Change-Id: Icbe71f099618e45098a61502b79f3694bcc49877
      Reviewed-on: https://chromium-review.googlesource.com/862004
      
      
      Reviewed-by: default avatarStéphane Marchesin <marcheu@chromium.org>
      Reviewed-by: default avatarIlja H. Friedel <ihf@chromium.org>
      Tested-by: default avatarIlja H. Friedel <ihf@chromium.org>
      Commit-Queue: Chad Versace <chadversary@chromium.org>
      6efb3d85
  17. Jan 16, 2018
    • Francisco Jerez's avatar
      UPSTREAM: intel/fs: Teach instruction scheduler about GRF bank conflict cycles. · d78b9b22
      Francisco Jerez authored
      
      This should allow the post-RA scheduler to do a slightly better job at
      hiding latency in presence of instructions incurring bank conflicts.
      The main purpuse of this patch is not to improve performance though,
      but to get conflict cycles to show up in shader-db statistics in order
      to make sure that regressions in the bank conflict mitigation pass
      don't go unnoticed.
      
      Acked-by: default avatarMatt Turner <mattst88@gmail.com>
      (cherry picked from commit acf98ff9)
      
      This patch is a prerequisite for 4cbe48f5 "intel/fs: Optimize and
      simplify the copy propagation dataflow logic".
      
      BUG=b:67394445
      TEST=No regressions in Android CTS, GLES tests.
      
      Change-Id: Ie10e8bf2116b28a637fd7a3829a44a00b2867f11
      Reviewed-on: https://chromium-review.googlesource.com/862003
      
      
      Commit-Ready: Chad Versace <chadversary@chromium.org>
      Tested-by: default avatarIlja H. Friedel <ihf@chromium.org>
      Reviewed-by: default avatarIlja H. Friedel <ihf@chromium.org>
      Reviewed-by: default avatarStéphane Marchesin <marcheu@chromium.org>
      d78b9b22
    • Francisco Jerez's avatar
      UPSTREAM: intel/fs: Implement GRF bank conflict mitigation pass. · dafe2a86
      Francisco Jerez authored
      
      Unnecessary GRF bank conflicts increase the issue time of ternary
      instructions (the overwhelmingly most common of which is MAD) by
      roughly 50%, leading to reduced ALU throughput.  This pass attempts to
      minimize the number of bank conflicts by rearranging the layout of the
      GRF space post-register allocation.  It's in general not possible to
      eliminate all of them without introducing extra copies, which are
      typically more expensive than the bank conflict itself.
      
      In a shader-db run on SKL this helps roughly 46k shaders:
      
         total conflicts in shared programs: 1008981 -> 600461 (-40.49%)
         conflicts in affected programs: 816222 -> 407702 (-50.05%)
         helped: 46234
         HURT: 72
      
      The running time of shader-db itself on SKL seems to be increased by
      roughly 2.52%1.13% with n=20 due to the additional work done by the
      compiler back-end.
      
      On earlier generations the pass is somewhat less effective in relative
      terms because the hardware incurs a bank conflict anytime the last two
      sources of the instruction are duplicate (e.g. while trying to square
      a value using MAD), which is impossible to avoid without introducing
      copies.  E.g. for a shader-db run on SNB:
      
         total conflicts in shared programs: 944636 -> 623185 (-34.03%)
         conflicts in affected programs: 853258 -> 531807 (-37.67%)
         helped: 31052
         HURT: 19
      
      And on BDW:
      
         total conflicts in shared programs: 1418393 -> 987539 (-30.38%)
         conflicts in affected programs: 1179787 -> 748933 (-36.52%)
         helped: 47592
         HURT: 70
      
      On SKL GT4e this improves performance of GpuTest Volplosion by 3.64%
      0.33% with n=16.
      
      NOTE: This patch intentionally disregards some i965 coding conventions
            for the sake of reviewability.  This is addressed by the next
            squash patch which introduces an amount of (for the most part
            boring) boilerplate that might distract reviewers from the
            non-trivial algorithmic details of the pass.
      
      The following patch is squashed in:
      
      SQUASH: intel/fs/bank_conflicts: Roll back to the nineties.
      
      Acked-by: default avatarMatt Turner <mattst88@gmail.com>
      (cherry picked from commit af2c3201)
      
      This patch is a prerequisite for 4cbe48f5 "intel/fs: Optimize and
      simplify the copy propagation dataflow logic".
      
      BUG=b:67394445
      TEST=No regressions in Android CTS, GLES tests.
      
      Change-Id: I21b0563b3855434a702989fbc947b786c486f7e3
      Reviewed-on: https://chromium-review.googlesource.com/862002
      
      
      Commit-Ready: Chad Versace <chadversary@chromium.org>
      Tested-by: default avatarIlja H. Friedel <ihf@chromium.org>
      Reviewed-by: default avatarIlja H. Friedel <ihf@chromium.org>
      Reviewed-by: default avatarStéphane Marchesin <marcheu@chromium.org>
      dafe2a86
  18. Dec 15, 2017
  19. Nov 30, 2017
Loading