1. 07 May, 2018 1 commit
    • Alyssa Rosenzweig's avatar
      Squash early Midgard driver · 5a5dc4f5
      Alyssa Rosenzweig authored
      History preserved in a branch.
      
      Rebase meson.build
      
      Fix syntax errors in the meson.build
      Signed-off-by: Alyssa Rosenzweig's avatarAlyssa Rosenzweig <alyssa@rosenzweig.io>
      
      Import ir3_cmdline.c from freedreno into panfrost
      Signed-off-by: Alyssa Rosenzweig's avatarAlyssa Rosenzweig <alyssa@rosenzweig.io>
      
      Begin removing freedreno-specific code in midgard
      Signed-off-by: Alyssa Rosenzweig's avatarAlyssa Rosenzweig <alyssa@rosenzweig.io>
      
      Fix panfrost include
      Signed-off-by: Alyssa Rosenzweig's avatarAlyssa Rosenzweig <alyssa@rosenzweig.io>
      
      Fully decouple midgard_cmdline from freedreno
      
      This enables the module to compile, providing stubs for the NIR
      compiler.
      Signed-off-by: Alyssa Rosenzweig's avatarAlyssa Rosenzweig <alyssa@rosenzweig.io>
      
      Fix panfrost dependency
      Signed-off-by: Alyssa Rosenzweig's avatarAlyssa Rosenzweig <alyssa@rosenzweig.io>
      
      [midgard] Dump NIR and remove unnecessary passes
      Signed-off-by: Alyssa Rosenzweig's avatarAlyssa Rosenzweig <alyssa@rosenzweig.io>
      
      Further reduce midgard
      Signed-off-by: Alyssa Rosenzweig's avatarAlyssa Rosenzweig <alyssa@rosenzweig.io>
      
      Iterate NIR instructions
      
      Further simplification
      Signed-off-by: Alyssa Rosenzweig's avatarAlyssa Rosenzweig <alyssa@rosenzweig.io>
      
      Ditto
      Signed-off-by: Alyssa Rosenzweig's avatarAlyssa Rosenzweig <alyssa@rosenzweig.io>
      
      Trace out emit path for load_const
      
      Store output intrinsic
      Signed-off-by: Alyssa Rosenzweig's avatarAlyssa Rosenzweig <alyssa@rosenzweig.io>
      
      Also vertex shaders
      Signed-off-by: Alyssa Rosenzweig's avatarAlyssa Rosenzweig <alyssa@rosenzweig.io>
      
      Lower var copies
      Signed-off-by: Alyssa Rosenzweig's avatarAlyssa Rosenzweig <alyssa@rosenzweig.io>
      
      Load uniform stub
      Signed-off-by: Alyssa Rosenzweig's avatarAlyssa Rosenzweig <alyssa@rosenzweig.io>
      
      String through compiler context
      
      Learn how to use util_dynarray for current_block
      
      Import midgard shader defines by Connor Abbott
      
      These were found in the original Midgard disassemble by cwabbott,
      extracted from the project cwabbots-open-gpu-tools under the license
      stated. They will be used here for instruction emission in the Midgard
      compiler.
      Signed-off-by: Alyssa Rosenzweig's avatarAlyssa Rosenzweig <alyssa@rosenzweig.io>
      
      Iterate midgard instruction types
      
      Remove type, next_type from load_store_t
      
      Instruction type tags
      
      Compute instruction lookahead
      
      Refactor get_lookahead_type
      
      Fix lookahead by lowering tag format
      Signed-off-by: Alyssa Rosenzweig's avatarAlyssa Rosenzweig <alyssa@rosenzweig.io>
      
      Fill in part of load_uniform, other ALU tags, etc
      Signed-off-by: Alyssa Rosenzweig's avatarAlyssa Rosenzweig <alyssa@rosenzweig.io>
      
      Dump load_store op
      
      Macro for load_uniform instructions
      
      Use for store_vary32 as well
      
      Register aliases
      
      reg, offset arguments to load_store
      
      Hack until we have initial output :)
      
      Swizzle macro
      
      Factor out emit_binary_instruction
      
      Refactor file I/O
      
      Begin emitting ALU ops
      
      ALU padding
      
      I misunderstood padding; fix it
      
      Demonstrate some tacked on constants
      
      Set sources
      
      Move ALU register work
      
      String through constants
      
      Correct registers
      
      Use correct register in fmov
      
      Refactor into M_ALU macro
      
      ALU_2
      
      Factor out attach_constants
      
      Remove print
      
      Emit ALU
      
      Fixes to '
      
      Make register resolution at least somewhat plausible
      
      Remove some debugging prints
      
      ALU source modifiers
      
      EMIT_ALU_CASE to macro
      
      fmul
      
      fmin, fmax
      
      load_vary
      
      Fix src
      
      Shader stage to differentiate varying/attrib load
      
      Algebraic pass
      
      Actual optimisation loop
      
      Import full list of known ALU opcodes
      
      Emit for remaining ALU ops (where possible)
      
      Update ALU ops
      
      Disable incorrect fsin/fcos for now
      
      Correctly implement sin and cos, extending NIR
      
      Explain midgard_instruction in relation to scheduler
      
      Any configuration in load_const is okay
      
      Comment half floats
      
      Don't break aliasing rules
      
      Begin eliminate_constant_mov pass
      
      Finish mov elimination
      
      Use raw SSA in the midgard compiler
      
      Register allocate stub
      
      fmov elimination is much easier in SSA space
      
      Switch to /dev/shm
      
      Try hash
      
      Search for constants
      
      Attach maybe
      
      I feel silly -- fix move elimination
      
      Update compiler options
      
      Reflow constant move loop
      
      Pair load/store instructions
      
      Don't introduce a dependency chain
      
      Correct fmov argument ordering
      
      [midgard] Disable vertex shader compilation
      
      The vertex shader epilogue for these GPUs is not yet well understood;
      it's not worth trying to compile for it quite yet.
      
      [midgard] FMA does not exist for GL
      
      [midgard] Lowering vecs to movs will be useful
      
      [midgard] Fix fmov instruction ordering
      
      [midgard] Properly noop load/stores
      
      midgard: Introduce synthwrite to catch gl_FragColor
      Signed-off-by: Alyssa Rosenzweig's avatarAlyssa Rosenzweig <alyssa@rosenzweig.io>
      
      midgard: Stub framebuffer write
      Signed-off-by: Alyssa Rosenzweig's avatarAlyssa Rosenzweig <alyssa@rosenzweig.io>
      
      midgard: Introduce variadic EMIT syntax sugar
      Signed-off-by: Alyssa Rosenzweig's avatarAlyssa Rosenzweig <alyssa@rosenzweig.io>
      
      midgard: Second half of the fbwrite
      Signed-off-by: Alyssa Rosenzweig's avatarAlyssa Rosenzweig <alyssa@rosenzweig.io>
      
      midgard: Literal out for proper fbwrite
      Signed-off-by: Alyssa Rosenzweig's avatarAlyssa Rosenzweig <alyssa@rosenzweig.io>
      
      midgard: Use actual compact writeout fields
      Signed-off-by: Alyssa Rosenzweig's avatarAlyssa Rosenzweig <alyssa@rosenzweig.io>
      
      midgard: Begin ALU op ombining
      Signed-off-by: Alyssa Rosenzweig's avatarAlyssa Rosenzweig <alyssa@rosenzweig.io>
      
      Continue ALU combining work
      
      midgard: Cleanup printfs
      Signed-off-by: Alyssa Rosenzweig's avatarAlyssa Rosenzweig <alyssa@rosenzweig.io>
      
      midgard: ALU combining
      Signed-off-by: Alyssa Rosenzweig's avatarAlyssa Rosenzweig <alyssa@rosenzweig.io>
      
      midgard: Instruction-combining aware lookahead
      Signed-off-by: Alyssa Rosenzweig's avatarAlyssa Rosenzweig <alyssa@rosenzweig.io>
      
      midgard: Register allocation position
      Signed-off-by: Alyssa Rosenzweig's avatarAlyssa Rosenzweig <alyssa@rosenzweig.io>
      
      midgard: Workaround missing preliminary load/store errata
      Signed-off-by: Alyssa Rosenzweig's avatarAlyssa Rosenzweig <alyssa@rosenzweig.io>
      
      midgard: Synthwrite was a mistake
      Signed-off-by: Alyssa Rosenzweig's avatarAlyssa Rosenzweig <alyssa@rosenzweig.io>
      
      midgard: Fix warnings
      Signed-off-by: Alyssa Rosenzweig's avatarAlyssa Rosenzweig <alyssa@rosenzweig.io>
      
      midgard: Basic uniform loading support
      Signed-off-by: Alyssa Rosenzweig's avatarAlyssa Rosenzweig <alyssa@rosenzweig.io>
      
      Set unknown field in varying load
      
      Saner load varying
      
      midgard: Use adder for add instructions
      Signed-off-by: Alyssa Rosenzweig's avatarAlyssa Rosenzweig <alyssa@rosenzweig.io>
      
      midgard: Rework load_input, etc to act like vc4/freedreno
      Signed-off-by: Alyssa Rosenzweig's avatarAlyssa Rosenzweig <alyssa@rosenzweig.io>
      
      midgard: Alias imov to fmov
      Signed-off-by: Alyssa Rosenzweig's avatarAlyssa Rosenzweig <alyssa@rosenzweig.io>
      
      midgard: Fix store out regrssion
      Signed-off-by: Alyssa Rosenzweig's avatarAlyssa Rosenzweig <alyssa@rosenzweig.io>
      
      midgard: Begin scalar work
      Signed-off-by: Alyssa Rosenzweig's avatarAlyssa Rosenzweig <alyssa@rosenzweig.io>
      
      Don't lower fsat
      
      midgard: Fix build
      Signed-off-by: Alyssa Rosenzweig's avatarAlyssa Rosenzweig <alyssa@rosenzweig.io>
      
      midgard: Lower to source mods pass
      Signed-off-by: Alyssa Rosenzweig's avatarAlyssa Rosenzweig <alyssa@rosenzweig.io>
      
      midgard: Saturation arithmetic
      Signed-off-by: Alyssa Rosenzweig's avatarAlyssa Rosenzweig <alyssa@rosenzweig.io>
      
      Refactor ALU emit to allow for scalar emit in future
      
      Remove unnecessary alu defs
      
      Allow scalar ops to be emitted
      
      midgard: Implement scalar_alu_modifiers
      
      Correct swizzle placement
      
      midgard: Correct order
      
      midgard: Account for scalar component special case
      
      midgard: Sort out memory safety regression from scalar refactor
      
      midgard: vlut mask
      
      midgard: Begin porting over vec4 pass from freedreno
      
      midgard: Fix vec4
      
      midgard: Remove deadcode
      
      Fix frcp support
      
      midgard: Fix bugs with scalar source modifiers
      
      midgard: Lower subtraction
      
      midgard: Begin debugging transcendental functions
      
      midgard: Proper SSA register aliasing
      
      midgard: General improvements relating to unused arguments
      
      midgard: Reenable vertex and disable double print
      
      midgard: Only emit fragment epilogue for fragment shaders
      
      midgard: Load attribute
      
      midgard: Assign var locations
      
      midgard: Front-half of SSA aliases
      
      midgard: Further progress on aliasing
      
      midgard: Optimise uniforms similarly
      
      midgard: Fix uniform special case
      
      midgard: Cleanup uniform aliasing
      
      midgard: Cleanup warnings
      
      midgard: Fix nondeterministic segfault
      
      midgard: Fix regression packing with unuseds
      
      midgard: Fix regression in regression fix
      
      midgard: Begin store vary emit
      
      midgard: Begin experimenting with nir_builder
      
      midgard: Write to special register from epilogue
      
      midgard: Load gl_Position in vertex epilogue
      
      midgard: Fix bug in aliasing implementation
      
      midgard: Further hack on vertex shader epilogue
      
      midgard: Defer stores to workaround hw errata?
      Signed-off-by: Alyssa Rosenzweig's avatarAlyssa Rosenzweig <alyssa@rosenzweig.io>
      
      midgard: Fix early constant inline termination
      
      Cut off duplicated embedded constants
      
      midgard: Move vertex epilogue to after var assignment
      
      midgard: Import ugly internal code to fix vertex shader epilogue
      
      midgard: Get vertex shaders working.... somehow
      
      midgard: Reenable fragment compilation
      
      midgard: Fix load/store noop emission
      
      Save real softpipe
      
      panfrost: Dump clears
      
      midgard: Workaround compact branch errata
      
      panfrost: XXX Hack in the trans library XXX
      
      Hook into panfrost, uglily
      
      Continue hacky panfrost integration
      
      panfrost: Begin ripping out drawing to enable shaders
      
      Begin interfacing with the hacky resource stuff
      
      Link in transfer map
      
      Hook up vertex functions
      
      Disable user buffers for now
      
      Solve some segfaults
      
      transfer_unmap
      
      Don't crash
      
      Work fixing varying writes
      
      Remove vertex epilogue varying magic
      
      Proceed implementing vertex 'epilogue' the Right way
      
      Remove cruft that has built up from previous refactor
      
      Update comments; nir_instr_remove old st_vary
      
      Remove now-unused defer_stores
      
      Remove redunant r0 move
      
      Note about the decaying issue
      
      Fix data hazard determination for ld/st pairing
      
      Finally get eliminate_varying_mov working nicely
      
      Cleanup from previous commit
      
      Dot products
      
      Call do_mat_op_to_vec
      
      Wrap do_mat_op_to_vec
      
      Get uniforms doing something somewhat sane
      
      Fix uniform access patterns
      
      Galliumify set_constant_buffer
      
      Cleanup comments
      
      Inline n2m_alu_outmod
      
      Compiler cleanup
      
      Begin watermark RA
      
      Fixes for watermark RA
      
      Proceed writing real RA?
      
      Get RA to work
      
      Quiet output
      
      Add some profiling stubs
      
      Remove redunant lower_io calls
      
      New information re varying registers
      
      Honour literal_out in ls4
      
      Implement vertex epilogue as per 12.5.1
      
      Perspective division
      
      Uniforms are backward; workaround buggy VLIW
      
      Fix crash on resource destory (mesa half)
      
      Remove softpipeism
      
      Work towards correct resizable shm windows
      
      Map the surface in the right place
      
      Continue
      
      Remove what we can
      
      Remove more
      
      Cut more
      
      Strip further
      
      Continue
      
      ACCELERATED flag
      
      Remove
      
      Strip shaders
      
      Fix overzealous inline constants
      
      Encode inline vector constants
      
      Mark errata with ERRATA, not XXX
      
      Enable two instruction chains instead of one
      
      Embedded constants with ALU combining (fixes long-time regression)
      
      Bundle duplicate constants
      
      Cull ssa0 moves (missed from inline constant in luts)
      
      Embedded to inline constant for right-constant scalar ops
      
      Scalar op flip
      
      Remove prints
      
      Inline constants in vector ops
      
      Begin work on instruction unit switching
      
      Branch compact can be packed
      
      Continue unit hopping work
      
      Split out helpers to prepare for updating midgard.h
      
      Pull in new midgard.h from SPD
      
      f2i->u
      
      Basic support for integers
      
      Disable inline constants for the moment, since they're broken
      
      inot requires MUL apparently
      
      Import new ops
      
      Emit ball/bany from NIR
      
      Import backend algebraic NIR pass stuff
      
      nir: Implement optional b2f->iand lowering
      
      This pass is required by the Midgard compiler; our instruction set uses
      NIR-style booleans (~0 for true) but lacks a dedicated b2f instruction.
      Normally, this lowering pass would be implemented in a backend-specific
      algebraic pass, but this conflicts with the existing iand->b2f pass in
      nir_opt_algebraic.py, hanging the compiler. This patch thus makes the
      existing pass optional (default on -- all other backends should remain
      unaffected), adding an optional pass for lowering the opposite
      direction.
      Signed-off-by: Alyssa Rosenzweig's avatarAlyssa Rosenzweig <alyssa@rosenzweig.io>
      
      f2b, b2f in midgard
      
      Small cleanup; fix floor/ceil
      
      LUT duplication
      
      Guarantee proper fragment writeout (incurring a temporary performance regression)
      
      Begin working on csel stuff
      
      midgard: Move fsinpi stuff to backend-specific pass
      
      Reenable embedded_to_inline_constant by making it integer aware
      
      Fix constant attaching
      
      ushr opcode
      
      Fix issue with imin/imax blocking
      
      Remove prints
      
      Componentwise test for r0 breakup
      
      Try to debug
      
      When flipping arguments, also flip modifiers
      
      Lower b2i to iand
      
      Fix segfault with inot
      
      Flip vector constants
      
      isub is not commutative
      
      fne _is_ commutative
      
      Remove prints
      
      Get rid of constant moves -- unnecessary complexity
      
      Remove STAGE_PROFILING
      
      Uniform base is no longer needed
      
      Remove unused macro
      
      Enable basic nir_register support in order to chuck out old vec4 pass
      
      Call convert_from_ssa weakly and generalise to registers in LUT duplication
      
      Fix st_vary input bug triggered by vertex epilogue refactor
      
      Mask for clarity
      
      Remove whitespace
      
      Fix annoying compiler segfault
      
      Reenable constant inlining (unaffected by registerisation)
      
      Fix varying move regresison and reenable
      
      Stubs to emit textures from NIR
      
      Begin basic texture op emission
      
      Get texture handles correct
      
      Set flags
      
      Set .cont and .last
      
      Hardcode mask/filter for now
      
      Hardcode a swizzle as well
      
      Force texture full for now
      
      Do something with the input swizzle
      
      Fix spelling error in header
      
      midgard: Emit fmov for source/dest texture
      
      midgard: Lower vars as necessary
      
      Rescale for the replay :v
      
      Handle weird 3D texture swizzle
      
      Stub for cubemap
      
      Hook up texture/sampler functions in softpipe shim
      
      Don't advertise compute/geometry shaders
      
      Import softpipe meson.build into panfrost
      
      Move shim into ~/panfrost
      
      Include panfrost_dri.so
      
      Register as fake swr
      
      Use the panfrost name
      
      Restore original softpipe
      5a5dc4f5
  2. 08 Mar, 2018 2 commits
    • Ian Romanick's avatar
      nir: Don't i2b a value that is already Boolean · 6878c9aa
      Ian Romanick authored
      A bunch of shaders have sequences like:
      
          i2b(u2i(floatBitsToUint(intBitsToFloat(x == y ? -1 : 0))))
      
      Other optimizations (and NIR's typeless nature) reduce this to
      
          i2b(x == y)
      
      which is silly.
      
      Skylake
      total instructions in shared programs: 14498698 -> 14497948 (<.01%)
      instructions in affected programs: 74480 -> 73730 (-1.01%)
      helped: 277
      HURT: 0
      helped stats (abs) min: 1 max: 32 x̄: 2.71 x̃: 2
      helped stats (rel) min: 0.04% max: 13.79% x̄: 1.45% x̃: 0.68%
      95% mean confidence interval for instructions value: -3.35 -2.06
      95% mean confidence interval for instructions %-change: -1.74% -1.16%
      Instructions are helped.
      
      total cycles in shared programs: 532015500 -> 531999238 (<.01%)
      cycles in affected programs: 5943878 -> 5927616 (-0.27%)
      helped: 251
      HURT: 74
      helped stats (abs) min: 1 max: 13149 x̄: 127.89 x̃: 14
      helped stats (rel) min: 0.01% max: 17.31% x̄: 1.55% x̃: 0.53%
      HURT stats (abs)   min: 1 max: 4550 x̄: 214.04 x̃: 15
      HURT stats (rel)   min: <.01% max: 44.43% x̄: 2.81% x̃: 0.33%
      95% mean confidence interval for cycles value: -158.51 58.43
      95% mean confidence interval for cycles %-change: -1.07% -0.04%
      Inconclusive result (value mean confidence interval includes 0).
      
      total loops in shared programs: 4753 -> 4735 (-0.38%)
      loops in affected programs: 18 -> 0
      helped: 18
      HURT: 0
      helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
      helped stats (rel) min: 100.00% max: 100.00% x̄: 100.00% x̃: 100.00%
      95% mean confidence interval for loops value: -1.00 -1.00
      95% mean confidence interval for loops %-change: -100.00% -100.00%
      Loops are helped.
      
      Haswell and Broadwell had simliar results. (Broadwell shown)
      total instructions in shared programs: 14791877 -> 14791127 (<.01%)
      instructions in affected programs: 77326 -> 76576 (-0.97%)
      helped: 278
      HURT: 1
      helped stats (abs) min: 1 max: 32 x̄: 2.70 x̃: 2
      helped stats (rel) min: 0.04% max: 13.79% x̄: 1.42% x̃: 0.68%
      HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
      HURT stats (rel)   min: 0.49% max: 0.49% x̄: 0.49% x̃: 0.49%
      95% mean confidence interval for instructions value: -3.33 -2.05
      95% mean confidence interval for instructions %-change: -1.70% -1.13%
      Instructions are helped.
      
      total cycles in shared programs: 558250067 -> 558252872 (<.01%)
      cycles in affected programs: 5806328 -> 5809133 (0.05%)
      helped: 235
      HURT: 83
      helped stats (abs) min: 1 max: 10630 x̄: 81.73 x̃: 16
      helped stats (rel) min: 0.03% max: 18.58% x̄: 1.60% x̃: 0.51%
      HURT stats (abs)   min: 1 max: 10590 x̄: 265.19 x̃: 20
      HURT stats (rel)   min: <.01% max: 15.28% x̄: 1.89% x̃: 0.54%
      95% mean confidence interval for cycles value: -89.87 107.51
      95% mean confidence interval for cycles %-change: -1.06% -0.32%
      Inconclusive result (value mean confidence interval includes 0).
      
      total loops in shared programs: 4735 -> 4717 (-0.38%)
      loops in affected programs: 18 -> 0
      helped: 18
      HURT: 0
      helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
      helped stats (rel) min: 100.00% max: 100.00% x̄: 100.00% x̃: 100.00%
      95% mean confidence interval for loops value: -1.00 -1.00
      95% mean confidence interval for loops %-change: -100.00% -100.00%
      Loops are helped.
      
      total fills in shared programs: 83111 -> 83110 (<.01%)
      fills in affected programs: 28 -> 27 (-3.57%)
      helped: 1
      HURT: 0
      
      Ivy Bridge
      total instructions in shared programs: 11774173 -> 11773436 (<.01%)
      instructions in affected programs: 70819 -> 70082 (-1.04%)
      helped: 267
      HURT: 0
      helped stats (abs) min: 1 max: 48 x̄: 2.76 x̃: 2
      helped stats (rel) min: 0.21% max: 19.51% x̄: 1.57% x̃: 0.63%
      95% mean confidence interval for instructions value: -3.51 -2.01
      95% mean confidence interval for instructions %-change: -1.94% -1.21%
      Instructions are helped.
      
      total cycles in shared programs: 257153833 -> 257148932 (<.01%)
      cycles in affected programs: 585341 -> 580440 (-0.84%)
      helped: 167
      HURT: 100
      helped stats (abs) min: 1 max: 1327 x̄: 44.89 x̃: 16
      helped stats (rel) min: 0.04% max: 26.54% x̄: 2.41% x̃: 0.88%
      HURT stats (abs)   min: 1 max: 200 x̄: 25.95 x̃: 16
      HURT stats (rel)   min: 0.04% max: 9.81% x̄: 1.34% x̃: 0.65%
      95% mean confidence interval for cycles value: -33.25 -3.46
      95% mean confidence interval for cycles %-change: -1.47% -0.54%
      Cycles are helped.
      
      total loops in shared programs: 3416 -> 3398 (-0.53%)
      loops in affected programs: 18 -> 0
      helped: 18
      HURT: 0
      helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
      helped stats (rel) min: 100.00% max: 100.00% x̄: 100.00% x̃: 100.00%
      95% mean confidence interval for loops value: -1.00 -1.00
      95% mean confidence interval for loops %-change: -100.00% -100.00%
      Loops are helped.
      
      LOST:   2
      GAINED: 0
      
      Sandy Bridge
      total instructions in shared programs: 10499306 -> 10499094 (<.01%)
      instructions in affected programs: 6051 -> 5839 (-3.50%)
      helped: 43
      HURT: 0
      helped stats (abs) min: 1 max: 32 x̄: 4.93 x̃: 2
      helped stats (rel) min: 0.39% max: 12.90% x̄: 4.29% x̃: 2.45%
      95% mean confidence interval for instructions value: -7.66 -2.20
      95% mean confidence interval for instructions %-change: -5.47% -3.12%
      Instructions are helped.
      
      total cycles in shared programs: 145862568 -> 145861370 (<.01%)
      cycles in affected programs: 61733 -> 60535 (-1.94%)
      helped: 36
      HURT: 2
      helped stats (abs) min: 16 max: 66 x̄: 36.61 x̃: 35
      helped stats (rel) min: 0.45% max: 17.31% x̄: 4.92% x̃: 2.81%
      HURT stats (abs)   min: 18 max: 102 x̄: 60.00 x̃: 60
      HURT stats (rel)   min: 1.10% max: 1.85% x̄: 1.48% x̃: 1.48%
      95% mean confidence interval for cycles value: -41.28 -21.77
      95% mean confidence interval for cycles %-change: -6.16% -3.00%
      Cycles are helped.
      
      total loops in shared programs: 1803 -> 1785 (-1.00%)
      loops in affected programs: 18 -> 0
      helped: 18
      HURT: 0
      helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
      helped stats (rel) min: 100.00% max: 100.00% x̄: 100.00% x̃: 100.00%
      95% mean confidence interval for loops value: -1.00 -1.00
      95% mean confidence interval for loops %-change: -100.00% -100.00%
      Loops are helped.
      
      LOST:   4
      GAINED: 0
      
      No changes on Iron Lake of GM45.
      Signed-off-by: default avatarIan Romanick <ian.d.romanick@intel.com>
      Reviewed-by: Samuel Iglesias Gonsálvez's avatarSamuel Iglesias Gonsálvez <siglesias@igalia.com>
      6878c9aa
    • Ian Romanick's avatar
      nir: Narrow some dot product operations · 54e8d226
      Ian Romanick authored
      On vector platforms, this helps elide some constant loads.
      
      v2: Reorder the transformations.
      
      No changes on Broadwell or Skylake.
      
      Haswell
      total instructions in shared programs: 13093793 -> 13060163 (-0.26%)
      instructions in affected programs: 1277532 -> 1243902 (-2.63%)
      helped: 13216
      HURT: 95
      helped stats (abs) min: 1 max: 18 x̄: 2.56 x̃: 2
      helped stats (rel) min: 0.21% max: 20.00% x̄: 3.63% x̃: 2.78%
      HURT stats (abs)   min: 1 max: 6 x̄: 1.77 x̃: 1
      HURT stats (rel)   min: 0.09% max: 5.56% x̄: 1.25% x̃: 1.19%
      95% mean confidence interval for instructions value: -2.57 -2.49
      95% mean confidence interval for instructions %-change: -3.65% -3.54%
      Instructions are helped.
      
      total cycles in shared programs: 409580819 -> 409268463 (-0.08%)
      cycles in affected programs: 71730652 -> 71418296 (-0.44%)
      helped: 9898
      HURT: 2352
      helped stats (abs) min: 2 max: 16014 x̄: 37.08 x̃: 16
      helped stats (rel) min: <.01% max: 35.55% x̄: 6.26% x̃: 4.50%
      HURT stats (abs)   min: 2 max: 276 x̄: 23.25 x̃: 6
      HURT stats (rel)   min: <.01% max: 40.00% x̄: 3.54% x̃: 1.97%
      95% mean confidence interval for cycles value: -33.19 -17.80
      95% mean confidence interval for cycles %-change: -4.50% -4.26%
      Cycles are helped.
      
      total fills in shared programs: 82059 -> 82052 (<.01%)
      fills in affected programs: 21 -> 14 (-33.33%)
      helped: 7
      HURT: 0
      
      Sandy Bridge and Ivy Bridge had similar results (Ivy Bridge shown)
      total instructions in shared programs: 11811851 -> 11780605 (-0.26%)
      instructions in affected programs: 1155007 -> 1123761 (-2.71%)
      helped: 12304
      HURT: 95
      helped stats (abs) min: 1 max: 18 x̄: 2.55 x̃: 2
      helped stats (rel) min: 0.21% max: 20.00% x̄: 3.69% x̃: 2.86%
      HURT stats (abs)   min: 1 max: 6 x̄: 1.77 x̃: 1
      HURT stats (rel)   min: 0.09% max: 5.56% x̄: 1.25% x̃: 1.19%
      95% mean confidence interval for instructions value: -2.56 -2.48
      95% mean confidence interval for instructions %-change: -3.71% -3.59%
      Instructions are helped.
      
      total cycles in shared programs: 257618409 -> 257316805 (-0.12%)
      cycles in affected programs: 71999580 -> 71697976 (-0.42%)
      helped: 9155
      HURT: 2380
      helped stats (abs) min: 2 max: 16014 x̄: 38.44 x̃: 16
      helped stats (rel) min: <.01% max: 35.75% x̄: 6.39% x̃: 4.62%
      HURT stats (abs)   min: 2 max: 290 x̄: 21.14 x̃: 4
      HURT stats (rel)   min: <.01% max: 41.55% x̄: 3.14% x̃: 1.33%
      95% mean confidence interval for cycles value: -34.32 -17.97
      95% mean confidence interval for cycles %-change: -4.55% -4.29%
      Cycles are helped.
      
      GM45 and Iron Lake had nearly identical results (Iron Lake shown)
      total instructions in shared programs: 7886750 -> 7879944 (-0.09%)
      instructions in affected programs: 373781 -> 366975 (-1.82%)
      helped: 3715
      HURT: 47
      helped stats (abs) min: 1 max: 8 x̄: 1.86 x̃: 1
      helped stats (rel) min: 0.22% max: 16.67% x̄: 2.88% x̃: 2.06%
      HURT stats (abs)   min: 1 max: 6 x̄: 2.55 x̃: 2
      HURT stats (rel)   min: 1.09% max: 5.00% x̄: 1.93% x̃: 2.35%
      95% mean confidence interval for instructions value: -1.85 -1.77
      95% mean confidence interval for instructions %-change: -2.91% -2.73%
      Instructions are helped.
      
      total cycles in shared programs: 178114636 -> 178095452 (-0.01%)
      cycles in affected programs: 7227666 -> 7208482 (-0.27%)
      helped: 3349
      HURT: 301
      helped stats (abs) min: 2 max: 90 x̄: 6.55 x̃: 4
      helped stats (rel) min: <.01% max: 14.18% x̄: 0.95% x̃: 0.63%
      HURT stats (abs)   min: 2 max: 42 x̄: 9.13 x̃: 10
      HURT stats (rel)   min: 0.01% max: 11.19% x̄: 1.22% x̃: 1.50%
      95% mean confidence interval for cycles value: -5.52 -4.99
      95% mean confidence interval for cycles %-change: -0.81% -0.73%
      Cycles are helped.
      Signed-off-by: default avatarIan Romanick <ian.d.romanick@intel.com>
      Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> [v1]
      54e8d226
  3. 06 Mar, 2018 7 commits
    • Ian Romanick's avatar
      nir: Simplify some comparisons like a+b < a · e3ea166a
      Ian Romanick authored
      All Gen7+ platforms had similar results. (Skylake shown)
      total instructions in shared programs: 14514555 -> 14514547 (<.01%)
      instructions in affected programs: 1972 -> 1964 (-0.41%)
      helped: 8
      HURT: 0
      helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
      helped stats (rel) min: 0.39% max: 0.42% x̄: 0.41% x̃: 0.41%
      95% mean confidence interval for instructions value: -1.00 -1.00
      95% mean confidence interval for instructions %-change: -0.41% -0.40%
      Instructions are helped.
      
      total cycles in shared programs: 533141444 -> 533136780 (<.01%)
      cycles in affected programs: 164728 -> 160064 (-2.83%)
      helped: 181
      HURT: 3
      helped stats (abs) min: 2 max: 94 x̄: 26.17 x̃: 30
      helped stats (rel) min: 0.12% max: 5.33% x̄: 3.42% x̃: 3.80%
      HURT stats (abs)   min: 4 max: 54 x̄: 24.00 x̃: 14
      HURT stats (rel)   min: 0.20% max: 2.39% x̄: 1.09% x̃: 0.68%
      95% mean confidence interval for cycles value: -27.12 -23.58
      95% mean confidence interval for cycles %-change: -3.54% -3.16%
      Cycles are helped.
      
      Sandy Bridge
      total instructions in shared programs: 10533667 -> 10533539 (<.01%)
      instructions in affected programs: 10148 -> 10020 (-1.26%)
      helped: 124
      HURT: 0
      helped stats (abs) min: 1 max: 2 x̄: 1.03 x̃: 1
      helped stats (rel) min: 0.39% max: 4.35% x̄: 2.20% x̃: 2.04%
      95% mean confidence interval for instructions value: -1.06 -1.00
      95% mean confidence interval for instructions %-change: -2.46% -1.95%
      Instructions are helped.
      
      total cycles in shared programs: 146136887 -> 146132122 (<.01%)
      cycles in affected programs: 206382 -> 201617 (-2.31%)
      helped: 171
      HURT: 0
      helped stats (abs) min: 2 max: 40 x̄: 27.87 x̃: 30
      helped stats (rel) min: 0.08% max: 5.73% x̄: 2.98% x̃: 2.67%
      95% mean confidence interval for cycles value: -29.19 -26.54
      95% mean confidence interval for cycles %-change: -3.20% -2.76%
      Cycles are helped.
      
      Iron Lake
      total instructions in shared programs: 7886515 -> 7886507 (<.01%)
      instructions in affected programs: 3016 -> 3008 (-0.27%)
      helped: 8
      HURT: 0
      helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
      helped stats (rel) min: 0.25% max: 0.28% x̄: 0.27% x̃: 0.27%
      95% mean confidence interval for instructions value: -1.00 -1.00
      95% mean confidence interval for instructions %-change: -0.27% -0.26%
      Instructions are helped.
      
      total cycles in shared programs: 178100396 -> 178100388 (<.01%)
      cycles in affected programs: 156128 -> 156120 (<.01%)
      helped: 4
      HURT: 4
      helped stats (abs) min: 4 max: 4 x̄: 4.00 x̃: 4
      helped stats (rel) min: 0.02% max: 0.04% x̄: 0.03% x̃: 0.03%
      HURT stats (abs)   min: 2 max: 2 x̄: 2.00 x̃: 2
      HURT stats (rel)   min: <.01% max: 0.01% x̄: <.01% x̃: <.01%
      95% mean confidence interval for cycles value: -3.68 1.68
      95% mean confidence interval for cycles %-change: -0.03% <.01%
      Inconclusive result (value mean confidence interval includes 0).
      
      GM45
      total instructions in shared programs: 4857872 -> 4857868 (<.01%)
      instructions in affected programs: 1544 -> 1540 (-0.26%)
      helped: 4
      HURT: 0
      helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
      helped stats (rel) min: 0.25% max: 0.27% x̄: 0.26% x̃: 0.26%
      95% mean confidence interval for instructions value: -1.00 -1.00
      95% mean confidence interval for instructions %-change: -0.28% -0.24%
      Instructions are helped.
      
      total cycles in shared programs: 122167654 -> 122167662 (<.01%)
      cycles in affected programs: 96248 -> 96256 (<.01%)
      helped: 0
      HURT: 4
      HURT stats (abs)   min: 2 max: 2 x̄: 2.00 x̃: 2
      HURT stats (rel)   min: <.01% max: 0.01% x̄: <.01% x̃: <.01%
      95% mean confidence interval for cycles value: 2.00 2.00
      95% mean confidence interval for cycles %-change: <.01% 0.02%
      Cycles are HURT.
      Signed-off-by: default avatarIan Romanick <ian.d.romanick@intel.com>
      Reviewed-by: Samuel Iglesias Gonsálvez's avatarSamuel Iglesias Gonsálvez <siglesias@igalia.com>
      e3ea166a
    • Ian Romanick's avatar
      nir: Use De Morgan's Law on logic compounded comparisons · d1ed4ffe
      Ian Romanick authored
      The replacement of the comparison operators must happen during this
      step.  If it does not, the next pass of nir_opt_algebraic will reapply
      De Morgan's Law in the "opposite direction" before performing dead code
      elimination.  The resulting infinite loop will eventually get OOM
      killed.
      
      Haswell, Broadwell, and Skylake had similar results. (Broadwell shown)
      total instructions in shared programs: 14808185 -> 14808036 (<.01%)
      instructions in affected programs: 13758 -> 13609 (-1.08%)
      helped: 39
      HURT: 0
      helped stats (abs) min: 1 max: 10 x̄: 3.82 x̃: 3
      helped stats (rel) min: 0.44% max: 1.55% x̄: 0.98% x̃: 1.01%
      95% mean confidence interval for instructions value: -4.67 -2.97
      95% mean confidence interval for instructions %-change: -1.09% -0.88%
      Instructions are helped.
      
      total cycles in shared programs: 559438333 -> 559435832 (<.01%)
      cycles in affected programs: 199160 -> 196659 (-1.26%)
      helped: 42
      HURT: 3
      helped stats (abs) min: 2 max: 184 x̄: 61.50 x̃: 51
      helped stats (rel) min: 0.02% max: 6.94% x̄: 1.41% x̃: 1.40%
      HURT stats (abs)   min: 2 max: 40 x̄: 27.33 x̃: 40
      HURT stats (rel)   min: 0.05% max: 0.74% x̄: 0.51% x̃: 0.74%
      95% mean confidence interval for cycles value: -71.47 -39.69
      95% mean confidence interval for cycles %-change: -1.64% -0.93%
      Cycles are helped.
      
      Sandy Bridge and Ivy Bridge had similar results. (Ivy Bridge shown)
      total instructions in shared programs: 11811776 -> 11811553 (<.01%)
      instructions in affected programs: 15201 -> 14978 (-1.47%)
      helped: 39
      HURT: 0
      helped stats (abs) min: 1 max: 20 x̄: 5.72 x̃: 6
      helped stats (rel) min: 0.44% max: 2.53% x̄: 1.30% x̃: 1.26%
      95% mean confidence interval for instructions value: -7.21 -4.23
      95% mean confidence interval for instructions %-change: -1.48% -1.12%
      Instructions are helped.
      
      total cycles in shared programs: 257617270 -> 257614589 (<.01%)
      cycles in affected programs: 212107 -> 209426 (-1.26%)
      helped: 45
      HURT: 0
      helped stats (abs) min: 2 max: 180 x̄: 59.58 x̃: 54
      helped stats (rel) min: 0.02% max: 6.02% x̄: 1.30% x̃: 1.32%
      95% mean confidence interval for cycles value: -74.02 -45.14
      95% mean confidence interval for cycles %-change: -1.59% -1.01%
      Cycles are helped.
      
      Iron Lake
      total instructions in shared programs: 7886648 -> 7886515 (<.01%)
      instructions in affected programs: 14106 -> 13973 (-0.94%)
      helped: 29
      HURT: 0
      helped stats (abs) min: 1 max: 10 x̄: 4.59 x̃: 4
      helped stats (rel) min: 0.35% max: 1.83% x̄: 0.90% x̃: 0.81%
      95% mean confidence interval for instructions value: -5.65 -3.52
      95% mean confidence interval for instructions %-change: -1.03% -0.76%
      Instructions are helped.
      
      total cycles in shared programs: 178100812 -> 178100396 (<.01%)
      cycles in affected programs: 67970 -> 67554 (-0.61%)
      helped: 29
      HURT: 0
      helped stats (abs) min: 2 max: 40 x̄: 14.34 x̃: 12
      helped stats (rel) min: 0.15% max: 1.69% x̄: 0.58% x̃: 0.54%
      95% mean confidence interval for cycles value: -18.30 -10.39
      95% mean confidence interval for cycles %-change: -0.71% -0.45%
      Cycles are helped.
      
      GM45
      total instructions in shared programs: 4857939 -> 4857872 (<.01%)
      instructions in affected programs: 7426 -> 7359 (-0.90%)
      helped: 15
      HURT: 0
      helped stats (abs) min: 1 max: 10 x̄: 4.47 x̃: 4
      helped stats (rel) min: 0.33% max: 1.80% x̄: 0.87% x̃: 0.77%
      95% mean confidence interval for instructions value: -6.06 -2.87
      95% mean confidence interval for instructions %-change: -1.06% -0.67%
      Instructions are helped.
      
      total cycles in shared programs: 122167930 -> 122167654 (<.01%)
      cycles in affected programs: 43118 -> 42842 (-0.64%)
      helped: 15
      HURT: 0
      helped stats (abs) min: 4 max: 40 x̄: 18.40 x̃: 16
      helped stats (rel) min: 0.15% max: 1.69% x̄: 0.62% x̃: 0.54%
      95% mean confidence interval for cycles value: -25.03 -11.77
      95% mean confidence interval for cycles %-change: -0.82% -0.41%
      Cycles are helped.
      Signed-off-by: default avatarIan Romanick <ian.d.romanick@intel.com>
      Reviewed-by: Samuel Iglesias Gonsálvez's avatarSamuel Iglesias Gonsálvez <siglesias@igalia.com>
      d1ed4ffe
    • Ian Romanick's avatar
      nir: Replace fmin(b2f(a), b) with a bcsel · 52607658
      Ian Romanick authored
      All of the affected shaders are HDR mappers from Serious Sam 3.
      
      All Gen7+ platforms had similar results. (Skylake shown)
      total instructions in shared programs: 14516285 -> 14516273 (<.01%)
      instructions in affected programs: 348 -> 336 (-3.45%)
      helped: 12
      HURT: 0
      helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
      helped stats (rel) min: 2.08% max: 6.67% x̄: 4.31% x̃: 4.17%
      95% mean confidence interval for instructions value: -1.00 -1.00
      95% mean confidence interval for instructions %-change: -5.55% -3.06%
      Instructions are helped.
      
      total cycles in shared programs: 533163876 -> 533163808 (<.01%)
      cycles in affected programs: 1144 -> 1076 (-5.94%)
      helped: 4
      HURT: 0
      helped stats (abs) min: 16 max: 18 x̄: 17.00 x̃: 17
      helped stats (rel) min: 5.80% max: 6.08% x̄: 5.94% x̃: 5.94%
      95% mean confidence interval for cycles value: -18.84 -15.16
      95% mean confidence interval for cycles %-change: -6.20% -5.68%
      Cycles are helped.
      
      Sandy Bridge
      total instructions in shared programs: 10533321 -> 10533309 (<.01%)
      instructions in affected programs: 372 -> 360 (-3.23%)
      helped: 12
      HURT: 0
      helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
      helped stats (rel) min: 2.00% max: 5.88% x̄: 3.91% x̃: 3.85%
      95% mean confidence interval for instructions value: -1.00 -1.00
      95% mean confidence interval for instructions %-change: -4.96% -2.86%
      Instructions are helped.
      
      total cycles in shared programs: 146136632 -> 146136428 (<.01%)
      cycles in affected programs: 11668 -> 11464 (-1.75%)
      helped: 12
      HURT: 0
      helped stats (abs) min: 16 max: 18 x̄: 17.00 x̃: 17
      helped stats (rel) min: 0.99% max: 3.44% x̄: 2.20% x̃: 2.29%
      95% mean confidence interval for cycles value: -17.66 -16.34
      95% mean confidence interval for cycles %-change: -2.82% -1.58%
      Cycles are helped.
      
      Iron Lake
      total instructions in shared programs: 7886301 -> 7886277 (<.01%)
      instructions in affected programs: 576 -> 552 (-4.17%)
      helped: 12
      HURT: 0
      helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2
      helped stats (rel) min: 2.94% max: 6.06% x̄: 4.51% x̃: 4.65%
      95% mean confidence interval for instructions value: -2.00 -2.00
      95% mean confidence interval for instructions %-change: -5.30% -3.72%
      Instructions are helped.
      
      total cycles in shared programs: 178113176 -> 178113176 (0.00%)
      cycles in affected programs: 2116 -> 2116 (0.00%)
      helped: 2
      HURT: 4
      helped stats (abs) min: 4 max: 4 x̄: 4.00 x̃: 4
      helped stats (rel) min: 1.14% max: 1.14% x̄: 1.14% x̃: 1.14%
      HURT stats (abs)   min: 2 max: 2 x̄: 2.00 x̃: 2
      HURT stats (rel)   min: 0.50% max: 0.65% x̄: 0.58% x̃: 0.58%
      95% mean confidence interval for cycles value: -3.25 3.25
      95% mean confidence interval for cycles %-change: -0.93% 0.94%
      Inconclusive result (value mean confidence interval includes 0).
      
      GM45
      total instructions in shared programs: 4857756 -> 4857744 (<.01%)
      instructions in affected programs: 294 -> 282 (-4.08%)
      helped: 6
      HURT: 0
      helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2
      helped stats (rel) min: 2.94% max: 5.71% x̄: 4.40% x̃: 4.55%
      95% mean confidence interval for instructions value: -2.00 -2.00
      95% mean confidence interval for instructions %-change: -5.71% -3.09%
      Instructions are helped.
      
      total cycles in shared programs: 122178730 -> 122178722 (<.01%)
      cycles in affected programs: 700 -> 692 (-1.14%)
      helped: 2
      HURT: 0
      Signed-off-by: default avatarIan Romanick <ian.d.romanick@intel.com>
      Reviewed-by: Samuel Iglesias Gonsálvez's avatarSamuel Iglesias Gonsálvez <siglesias@igalia.com>
      52607658
    • Ian Romanick's avatar
      nir: Pull b2f out of bcsel · b974dfee
      Ian Romanick authored
      All platforms had similar results. (Skylake shown)
      total instructions in shared programs: 14516592 -> 14516586 (<.01%)
      instructions in affected programs: 500 -> 494 (-1.20%)
      helped: 2
      HURT: 0
      
      total cycles in shared programs: 533167044 -> 533166998 (<.01%)
      cycles in affected programs: 6988 -> 6942 (-0.66%)
      helped: 2
      HURT: 0
      Signed-off-by: default avatarIan Romanick <ian.d.romanick@intel.com>
      Reviewed-by: Samuel Iglesias Gonsálvez's avatarSamuel Iglesias Gonsálvez <siglesias@igalia.com>
      b974dfee
    • Ian Romanick's avatar
      nir: Replace an odd comparison involving fmin of -b2f · f50400cc
      Ian Romanick authored
      I noticed the fge version while looking at a shader for an unrelated
      reason.  The feq version prevents a regression in a later change that
      performs strength reduction of some compares.
      
      Broadwell and Skylake had similar results. (Skylake shown)
      total instructions in shared programs: 14514808 -> 14514796 (<.01%)
      instructions in affected programs: 750 -> 738 (-1.60%)
      helped: 4
      HURT: 0
      helped stats (abs) min: 1 max: 5 x̄: 3.00 x̃: 3
      helped stats (rel) min: 0.83% max: 1.96% x̄: 1.40% x̃: 1.40%
      95% mean confidence interval for instructions value: -6.67 0.67
      95% mean confidence interval for instructions %-change: -2.43% -0.36%
      Inconclusive result (value mean confidence interval includes 0).
      
      total cycles in shared programs: 533144939 -> 533144853 (<.01%)
      cycles in affected programs: 8911 -> 8825 (-0.97%)
      helped: 4
      HURT: 0
      helped stats (abs) min: 16 max: 32 x̄: 21.50 x̃: 19
      helped stats (rel) min: 0.60% max: 1.89% x̄: 1.28% x̃: 1.31%
      95% mean confidence interval for cycles value: -32.94 -10.06
      95% mean confidence interval for cycles %-change: -2.30% -0.26%
      Cycles are helped.
      
      Haswell
      total instructions in shared programs: 13093785 -> 13093775 (<.01%)
      instructions in affected programs: 924 -> 914 (-1.08%)
      helped: 4
      HURT: 2
      helped stats (abs) min: 1 max: 5 x̄: 3.00 x̃: 3
      helped stats (rel) min: 0.82% max: 1.95% x̄: 1.39% x̃: 1.39%
      HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
      HURT stats (rel)   min: 1.19% max: 1.19% x̄: 1.19% x̃: 1.19%
      95% mean confidence interval for instructions value: -4.53 1.20
      95% mean confidence interval for instructions %-change: -2.02% 0.97%
      Inconclusive result (value mean confidence interval includes 0).
      
      total cycles in shared programs: 409580553 -> 409580118 (<.01%)
      cycles in affected programs: 10909 -> 10474 (-3.99%)
      helped: 5
      HURT: 1
      helped stats (abs) min: 6 max: 222 x̄: 89.60 x̃: 18
      helped stats (rel) min: 0.16% max: 24.72% x̄: 9.54% x̃: 1.78%
      HURT stats (abs)   min: 13 max: 13 x̄: 13.00 x̃: 13
      HURT stats (rel)   min: 0.39% max: 0.39% x̄: 0.39% x̃: 0.39%
      95% mean confidence interval for cycles value: -180.68 35.68
      95% mean confidence interval for cycles %-change: -19.55% 3.79%
      Inconclusive result (value mean confidence interval includes 0).
      
      Ivy Bridge
      total instructions in shared programs: 11811851 -> 11811840 (<.01%)
      instructions in affected programs: 1032 -> 1021 (-1.07%)
      helped: 5
      HURT: 1
      helped stats (abs) min: 1 max: 5 x̄: 2.40 x̃: 1
      helped stats (rel) min: 0.63% max: 1.95% x̄: 1.13% x̃: 0.97%
      HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
      HURT stats (rel)   min: 1.19% max: 1.19% x̄: 1.19% x̃: 1.19%
      95% mean confidence interval for instructions value: -4.17 0.51
      95% mean confidence interval for instructions %-change: -1.86% 0.36%
      Inconclusive result (value mean confidence interval includes 0).
      
      total cycles in shared programs: 257618403 -> 257618168 (<.01%)
      cycles in affected programs: 10784 -> 10549 (-2.18%)
      helped: 4
      HURT: 2
      helped stats (abs) min: 4 max: 220 x̄: 64.50 x̃: 17
      helped stats (rel) min: 0.50% max: 24.34% x̄: 7.07% x̃: 1.72%
      HURT stats (abs)   min: 9 max: 14 x̄: 11.50 x̃: 11
      HURT stats (rel)   min: 0.24% max: 0.42% x̄: 0.33% x̃: 0.33%
      95% mean confidence interval for cycles value: -133.11 54.78
      95% mean confidence interval for cycles %-change: -14.79% 5.59%
      Inconclusive result (value mean confidence interval includes 0).
      
      GM45, Iron Lake, and Sandy Bridge had similar results. (Sandy Bridge shown)
      total instructions in shared programs: 10533871 -> 10533859 (<.01%)
      instructions in affected programs: 865 -> 853 (-1.39%)
      helped: 4
      HURT: 0
      helped stats (abs) min: 1 max: 5 x̄: 3.00 x̃: 3
      helped stats (rel) min: 0.63% max: 1.83% x̄: 1.22% x̃: 1.21%
      95% mean confidence interval for instructions value: -6.67 0.67
      95% mean confidence interval for instructions %-change: -2.16% -0.29%
      Inconclusive result (value mean confidence interval includes 0).
      
      total cycles in shared programs: 146139904 -> 146139852 (<.01%)
      cycles in affected programs: 15213 -> 15161 (-0.34%)
      helped: 4
      HURT: 0
      helped stats (abs) min: 3 max: 18 x̄: 13.00 x̃: 15
      helped stats (rel) min: 0.15% max: 0.84% x̄: 0.39% x̃: 0.29%
      95% mean confidence interval for cycles value: -23.79 -2.21
      95% mean confidence interval for cycles %-change: -0.88% 0.09%
      Inconclusive result (%-change mean confidence interval includes 0).
      Signed-off-by: default avatarIan Romanick <ian.d.romanick@intel.com>
      Reviewed-by: Samuel Iglesias Gonsálvez's avatarSamuel Iglesias Gonsálvez <siglesias@igalia.com>
      f50400cc
    • Ian Romanick's avatar
      nir: Mark bcsel-to-fmin (or fmax) transformations as inexact · 380136e9
      Ian Romanick authored
      These transformations are inexact because section 4.7.1 (Range and
      Precision) says:
      
          Operations and built-in functions that operate on a NaN are not
          required to return a NaN as the result.
      
      The fmin or fmax might not return NaN in cases where the original
      expression would be required to return NaN.
      Signed-off-by: default avatarIan Romanick <ian.d.romanick@intel.com>
      Reviewed-by: Jason Ekstrand's avatarJason Ekstrand <jason@jlekstrand.net>
      380136e9
    • Ian Romanick's avatar
      nir: Recognize some more open-coded fmin / fmax · 4addd34b
      Ian Romanick authored
      This transformation is inexact because section 4.7.1 (Range and
      Precision) says:
      
          Operations and built-in functions that operate on a NaN are not
          required to return a NaN as the result.
      
      The fmin or fmax might not return NaN in cases where the original
      expression would be required to return NaN.
      
      v2: Reorder operands and mark as inexact.  The latter suggested by
      Jason.
      
      shader-db results:
      
      Haswell, Broadwell, and Skylake had similar results. (Skylake shown)
      total instructions in shared programs: 14514817 -> 14514808 (<.01%)
      instructions in affected programs: 229 -> 220 (-3.93%)
      helped: 3
      HURT: 0
      helped stats (abs) min: 1 max: 4 x̄: 3.00 x̃: 4
      helped stats (rel) min: 2.86% max: 4.12% x̄: 3.70% x̃: 4.12%
      
      total cycles in shared programs: 533145211 -> 533144939 (<.01%)
      cycles in affected programs: 37268 -> 36996 (-0.73%)
      helped: 8
      HURT: 0
      helped stats (abs) min: 2 max: 134 x̄: 34.00 x̃: 2
      helped stats (rel) min: 0.02% max: 14.22% x̄: 3.53% x̃: 0.05%
      
      Sandy Bridge and Ivy Bridge had similar results. (Ivy Bridge shown)
      total cycles in shared programs: 257618409 -> 257618403 (<.01%)
      cycles in affected programs: 12582 -> 12576 (-0.05%)
      helped: 3
      HURT: 0
      helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2
      helped stats (rel) min: 0.05% max: 0.05% x̄: 0.05% x̃: 0.05%
      
      No changes on Iron Lake or GM45.
      Signed-off-by: default avatarIan Romanick <ian.d.romanick@intel.com>
      Reviewed-by: Jason Ekstrand's avatarJason Ekstrand <jason@jlekstrand.net>
      4addd34b
  4. 27 Feb, 2018 1 commit
  5. 22 Feb, 2018 2 commits
  6. 30 Jan, 2018 8 commits
    • Ian Romanick's avatar
      nir: Distribute binary operations with constants into bcsel · ee63933a
      Ian Romanick authored
      This was specifically designed to simplify 1+mix(0, a-1, condition) to
      mix(1, a, condition) by pushing the 1+ inside.
      
      Skylake, Broadwell, and Haswell had similar results.  Skylake shown.
      total instructions in shared programs: 14521753 -> 14521716 (<.01%)
      instructions in affected programs: 10619 -> 10582 (-0.35%)
      helped: 51
      HURT: 14
      helped stats (abs) min: 1 max: 12 x̄: 1.43 x̃: 1
      helped stats (rel) min: 0.20% max: 3.58% x̄: 1.01% x̃: 0.95%
      HURT stats (abs)   min: 1 max: 11 x̄: 2.57 x̃: 1
      HURT stats (rel)   min: 0.22% max: 1.75% x̄: 1.20% x̃: 1.32%
      95% mean confidence interval for instructions value: -1.31 0.17
      95% mean confidence interval for instructions %-change: -0.80% -0.27%
      Inconclusive result (value mean confidence interval includes 0).
      
      total cycles in shared programs: 533000205 -> 533003533 (<.01%)
      cycles in affected programs: 110610 -> 113938 (3.01%)
      helped: 43
      HURT: 28
      helped stats (abs) min: 6 max: 440 x̄: 27.12 x̃: 16
      helped stats (rel) min: 0.39% max: 4.84% x̄: 1.60% x̃: 1.67%
      HURT stats (abs)   min: 2 max: 3066 x̄: 160.50 x̃: 14
      HURT stats (rel)   min: 0.08% max: 77.78% x̄: 5.16% x̃: 0.62%
      95% mean confidence interval for cycles value: -43.81 137.56
      95% mean confidence interval for cycles %-change: -1.47% 3.60%
      Inconclusive result (value mean confidence interval includes 0).
      
      Ivy Bridge
      total instructions in shared programs: 10018840 -> 10018713 (<.01%)
      instructions in affected programs: 9431 -> 9304 (-1.35%)
      helped: 51
      HURT: 3
      helped stats (abs) min: 1 max: 80 x̄: 2.76 x̃: 1
      helped stats (rel) min: 0.20% max: 16.43% x̄: 1.16% x̃: 0.81%
      HURT stats (abs)   min: 1 max: 12 x̄: 4.67 x̃: 1
      HURT stats (rel)   min: 0.22% max: 1.33% x̄: 0.59% x̃: 0.22%
      95% mean confidence interval for instructions value: -5.36 0.66
      95% mean confidence interval for instructions %-change: -1.66% -0.46%
      Inconclusive result (value mean confidence interval includes 0).
      
      total cycles in shared programs: 87571944 -> 87572785 (<.01%)
      cycles in affected programs: 117234 -> 118075 (0.72%)
      helped: 42
      HURT: 23
      helped stats (abs) min: 2 max: 114 x̄: 51.90 x̃: 30
      helped stats (rel) min: 0.11% max: 11.01% x̄: 4.45% x̃: 2.74%
      HURT stats (abs)   min: 1 max: 2341 x̄: 131.35 x̃: 10
      HURT stats (rel)   min: 0.06% max: 37.11% x̄: 2.75% x̃: 0.61%
      95% mean confidence interval for cycles value: -61.05 86.93
      95% mean confidence interval for cycles %-change: -3.47% -0.33%
      Inconclusive result (value mean confidence interval includes 0).
      
      Sandy Bridge
      total instructions in shared programs: 10542933 -> 10542844 (<.01%)
      instructions in affected programs: 11487 -> 11398 (-0.77%)
      helped: 52
      HURT: 3
      helped stats (abs) min: 1 max: 40 x̄: 1.96 x̃: 1
      helped stats (rel) min: 0.08% max: 8.16% x̄: 0.90% x̃: 0.72%
      HURT stats (abs)   min: 1 max: 11 x̄: 4.33 x̃: 1
      HURT stats (rel)   min: 0.22% max: 1.22% x̄: 0.55% x̃: 0.22%
      95% mean confidence interval for instructions value: -3.17 -0.07
      95% mean confidence interval for instructions %-change: -1.13% -0.52%
      Instructions are helped.
      
      total cycles in shared programs: 146098397 -> 146097094 (<.01%)
      cycles in affected programs: 128140 -> 126837 (-1.02%)
      helped: 47
      HURT: 8
      helped stats (abs) min: 2 max: 333 x̄: 29.21 x̃: 18
      helped stats (rel) min: 0.13% max: 5.04% x̄: 1.18% x̃: 0.95%
      HURT stats (abs)   min: 1 max: 16 x̄: 8.75 x̃: 9
      HURT stats (rel)   min: 0.08% max: 0.43% x̄: 0.30% x̃: 0.34%
      95% mean confidence interval for cycles value: -37.49 -9.90
      95% mean confidence interval for cycles %-change: -1.22% -0.71%
      Cycles are helped.
      
      Iron Lake
      total instructions in shared programs: 7886711 -> 7886509 (<.01%)
      instructions in affected programs: 10425 -> 10223 (-1.94%)
      helped: 50
      HURT: 2
      helped stats (abs) min: 1 max: 78 x̄: 4.08 x̃: 1
      helped stats (rel) min: 0.34% max: 15.38% x̄: 1.12% x̃: 0.54%
      HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
      HURT stats (rel)   min: 0.86% max: 0.91% x̄: 0.89% x̃: 0.89%
      95% mean confidence interval for instructions value: -8.05 0.28
      95% mean confidence interval for instructions %-change: -1.83% -0.26%
      Inconclusive result (value mean confidence interval includes 0).
      
      total cycles in shared programs: 178115324 -> 178114612 (<.01%)
      cycles in affected programs: 765726 -> 765014 (-0.09%)
      helped: 39
      HURT: 1
      helped stats (abs) min: 2 max: 276 x̄: 18.31 x̃: 8
      helped stats (rel) min: <.01% max: 8.47% x̄: 0.39% x̃: 0.04%
      HURT stats (abs)   min: 2 max: 2 x̄: 2.00 x̃: 2
      HURT stats (rel)   min: 0.03% max: 0.03% x̄: 0.03% x̃: 0.03%
      95% mean confidence interval for cycles value: -32.07 -3.53
      95% mean confidence interval for cycles %-change: -0.86% 0.10%
      Inconclusive result (%-change mean confidence interval includes 0).
      
      GM45
      total instructions in shared programs: 4857762 -> 4857661 (<.01%)
      instructions in affected programs: 5523 -> 5422 (-1.83%)
      helped: 25
      HURT: 1
      helped stats (abs) min: 1 max: 78 x̄: 4.08 x̃: 1
      helped stats (rel) min: 0.34% max: 13.61% x̄: 1.04% x̃: 0.52%
      HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
      HURT stats (rel)   min: 0.86% max: 0.86% x̄: 0.86% x̃: 0.86%
      95% mean confidence interval for instructions value: -9.99 2.22
      95% mean confidence interval for instructions %-change: -2.01% 0.08%
      Inconclusive result (value mean confidence interval includes 0).
      
      total cycles in shared programs: 122179674 -> 122179194 (<.01%)
      cycles in affected programs: 530162 -> 529682 (-0.09%)
      helped: 22
      HURT: 1
      helped stats (abs) min: 2 max: 292 x̄: 21.91 x̃: 7
      helped stats (rel) min: <.01% max: 8.65% x̄: 0.44% x̃: 0.04%
      HURT stats (abs)   min: 2 max: 2 x̄: 2.00 x̃: 2
      HURT stats (rel)   min: 0.03% max: 0.03% x̄: 0.03% x̃: 0.03%
      95% mean confidence interval for cycles value: -46.56 4.82
      95% mean confidence interval for cycles %-change: -1.20% 0.36%
      Inconclusive result (value mean confidence interval includes 0).
      Signed-off-by: default avatarIan Romanick <ian.d.romanick@intel.com>
      Reviewed-by: Samuel Iglesias Gonsálvez's avatarSamuel Iglesias Gonsálvez <siglesias@igalia.com>
      Reviewed-by: default avatarElie Tournier <elie.tournier@collabora.com>
      ee63933a
    • Ian Romanick's avatar
      nir: Rearrange logic op-compounded integer compares · 03fb13f6
      Ian Romanick authored
      Skylake and Broadwell had similar results.  Skylake shown.
      total instructions in shared programs: 14521769 -> 14521753 (<.01%)
      instructions in affected programs: 8782 -> 8766 (-0.18%)
      helped: 16
      HURT: 0
      helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
      helped stats (rel) min: 0.12% max: 0.40% x̄: 0.20% x̃: 0.18%
      95% mean confidence interval for instructions value: -1.00 -1.00
      95% mean confidence interval for instructions %-change: -0.23% -0.16%
      Instructions are helped.
      
      total cycles in shared programs: 533000376 -> 533000205 (<.01%)
      cycles in affected programs: 447035 -> 446864 (-0.04%)
      helped: 9
      HURT: 9
      helped stats (abs) min: 2 max: 40 x̄: 35.78 x̃: 40
      helped stats (rel) min: 0.02% max: 0.18% x̄: 0.10% x̃: 0.09%
      HURT stats (abs)   min: 1 max: 52 x̄: 16.78 x̃: 10
      HURT stats (rel)   min: <.01% max: 1.11% x̄: 0.29% x̃: 0.12%
      95% mean confidence interval for cycles value: -25.07 6.07
      95% mean confidence interval for cycles %-change: -0.08% 0.27%
      Inconclusive result (value mean confidence interval includes 0).
      
      No changes on GM45, Iron Lake, Sandy Bridge, Ivy Bridge, or Haswell.
      Signed-off-by: default avatarIan Romanick <ian.d.romanick@intel.com>
      Reviewed-by: Samuel Iglesias Gonsálvez's avatarSamuel Iglesias Gonsálvez <siglesias@igalia.com>
      Reviewed-by: default avatarElie Tournier <elie.tournier@collabora.com>
      03fb13f6
    • Ian Romanick's avatar
      nir: Rearrange and-compounded float compares · 053be9f0
      Ian Romanick authored
      If both comparisons are used as sources for instructions other than the
      iand, this transformation is detrimental.  If the non-identical value in
      both compares is constant, the fmin or fmax will be constant-folded
      away, so the transformation is always a win.
      
      It is interesting to me that on Iron Lake only 81 shaders have
      instruction counts changed, but 726 shaders have cycle counts changed.
      
      shader-db results:
      
      Skylake
      total instructions in shared programs: 14525728 -> 14521017 (-0.03%)
      instructions in affected programs: 1164726 -> 1160015 (-0.40%)
      helped: 1692
      HURT: 5
      helped stats (abs) min: 1 max: 637 x̄: 2.79 x̃: 2
      helped stats (rel) min: 0.07% max: 16.36% x̄: 0.81% x̃: 0.33%
      HURT stats (abs)   min: 1 max: 12 x̄: 3.20 x̃: 1
      HURT stats (rel)   min: 0.38% max: 2.86% x̄: 2.36% x̃: 2.86%
      95% mean confidence interval for instructions value: -3.52 -2.03
      95% mean confidence interval for instructions %-change: -0.86% -0.74%
      Instructions are helped.
      
      total cycles in shared programs: 533115449 -> 532991404 (-0.02%)
      cycles in affected programs: 119401803 -> 119277758 (-0.10%)
      helped: 1145
      HURT: 467
      helped stats (abs) min: 1 max: 34644 x̄: 145.92 x̃: 18
      helped stats (rel) min: <.01% max: 45.33% x̄: 1.58% x̃: 0.42%
      HURT stats (abs)   min: 1 max: 1590 x̄: 92.15 x̃: 15
      HURT stats (rel)   min: <.01% max: 13.48% x̄: 1.26% x̃: 0.39%
      95% mean confidence interval for cycles value: -122.16 -31.74
      95% mean confidence interval for cycles %-change: -0.94% -0.57%
      Cycles are helped.
      
      total spills in shared programs: 9597 -> 9534 (-0.66%)
      spills in affected programs: 403 -> 340 (-15.63%)
      helped: 1
      HURT: 1
      
      total fills in shared programs: 13904 -> 13790 (-0.82%)
      fills in affected programs: 1627 -> 1513 (-7.01%)
      helped: 2
      HURT: 1
      
      LOST:   0
      GAINED: 2
      
      Broadwell
      total instructions in shared programs: 14816966 -> 14812590 (-0.03%)
      instructions in affected programs: 1499885 -> 1495509 (-0.29%)
      helped: 1672
      HURT: 15
      helped stats (abs) min: 1 max: 455 x̄: 2.70 x̃: 2
      helped stats (rel) min: 0.05% max: 16.36% x̄: 0.81% x̃: 0.33%
      HURT stats (abs)   min: 1 max: 21 x̄: 9.20 x̃: 8
      HURT stats (rel)   min: 0.08% max: 2.86% x̄: 1.06% x̃: 0.53%
      95% mean confidence interval for instructions value: -3.14 -2.05
      95% mean confidence interval for instructions %-change: -0.85% -0.73%
      Instructions are helped.
      
      total cycles in shared programs: 559353622 -> 559345595 (<.01%)
      cycles in affected programs: 139893703 -> 139885676 (<.01%)
      helped: 921
      HURT: 697
      helped stats (abs) min: 1 max: 42424 x̄: 143.45 x̃: 18
      helped stats (rel) min: <.01% max: 36.23% x̄: 2.02% x̃: 0.87%
      HURT stats (abs)   min: 1 max: 2370 x̄: 178.03 x̃: 38
      HURT stats (rel)   min: <.01% max: 17.35% x̄: 0.71% x̃: 0.14%
      95% mean confidence interval for cycles value: -59.64 49.72
      95% mean confidence interval for cycles %-change: -1.02% -0.66%
      Inconclusive result (value mean confidence interval includes 0).
      
      total spills in shared programs: 78902 -> 78861 (-0.05%)
      spills in affected programs: 2418 -> 2377 (-1.70%)
      helped: 1
      HURT: 11
      
      total fills in shared programs: 83782 -> 83678 (-0.12%)
      fills in affected programs: 3515 -> 3411 (-2.96%)
      helped: 2
      HURT: 11
      
      LOST:   0
      GAINED: 5
      
      Haswell and Ivy Bridge had similar results. Haswell shown.
      total instructions in shared programs: 9033898 -> 9032010 (-0.02%)
      instructions in affected programs: 308064 -> 306176 (-0.61%)
      helped: 921
      HURT: 4
      helped stats (abs) min: 1 max: 20 x̄: 2.05 x̃: 1
      helped stats (rel) min: 0.17% max: 17.54% x̄: 0.80% x̃: 0.35%
      HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
      HURT stats (rel)   min: 3.23% max: 3.23% x̄: 3.23% x̃: 3.23%
      95% mean confidence interval for instructions value: -2.21 -1.87
      95% mean confidence interval for instructions %-change: -0.88% -0.68%
      Instructions are helped.
      
      total cycles in shared programs: 84628949 -> 84620520 (<.01%)
      cycles in affected programs: 2164913 -> 2156484 (-0.39%)
      helped: 518
      HURT: 359
      helped stats (abs) min: 1 max: 440 x̄: 41.52 x̃: 20
      helped stats (rel) min: <.01% max: 17.17% x̄: 1.95% x̃: 1.01%
      HURT stats (abs)   min: 1 max: 586 x̄: 36.43 x̃: 8
      HURT stats (rel)   min: 0.04% max: 18.65% x̄: 1.47% x̃: 0.40%
      95% mean confidence interval for cycles value: -15.17 -4.05
      95% mean confidence interval for cycles %-change: -0.77% -0.32%
      Cycles are helped.
      
      LOST:   0
      GAINED: 4
      
      Sandy Bridge
      total instructions in shared programs: 10544860 -> 10542933 (-0.02%)
      instructions in affected programs: 360019 -> 358092 (-0.54%)
      helped: 931
      HURT: 4
      helped stats (abs) min: 1 max: 20 x̄: 2.07 x̃: 1
      helped stats (rel) min: 0.11% max: 15.52% x̄: 0.68% x̃: 0.30%
      HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
      HURT stats (rel)   min: 3.33% max: 3.33% x̄: 3.33% x̃: 3.33%
      95% mean confidence interval for instructions value: -2.23 -1.89
      95% mean confidence interval for instructions %-change: -0.76% -0.58%
      Instructions are helped.
      
      total cycles in shared programs: 146106820 -> 146098397 (<.01%)
      cycles in affected programs: 3435047 -> 3426624 (-0.25%)
      helped: 572
      HURT: 329
      helped stats (abs) min: 1 max: 1289 x̄: 32.52 x̃: 15
      helped stats (rel) min: <.01% max: 26.29% x̄: 0.97% x̃: 0.33%
      HURT stats (abs)   min: 1 max: 1714 x̄: 30.93 x̃: 6
      HURT stats (rel)   min: 0.02% max: 41.31% x̄: 1.13% x̃: 0.19%
      95% mean confidence interval for cycles value: -16.85 -1.85
      95% mean confidence interval for cycles %-change: -0.39% -0.01%
      Cycles are helped.
      
      LOST:   1
      GAINED: 0
      
      Iron Lake
      total instructions in shared programs: 7886925 -> 7886711 (<.01%)
      instructions in affected programs: 25763 -> 25549 (-0.83%)
      helped: 75
      HURT: 6
      helped stats (abs) min: 1 max: 13 x̄: 3.33 x̃: 1
      helped stats (rel) min: 0.35% max: 17.57% x̄: 1.96% x̃: 0.53%
      HURT stats (abs)   min: 1 max: 16 x̄: 6.00 x̃: 1
      HURT stats (rel)   min: 2.86% max: 4.79% x̄: 3.49% x̃: 2.86%
      95% mean confidence interval for instructions value: -3.69 -1.60
      95% mean confidence interval for instructions %-change: -2.54% -0.57%
      Instructions are helped.
      
      total cycles in shared programs: 178116888 -> 178115324 (<.01%)
      cycles in affected programs: 5858790 -> 5857226 (-0.03%)
      helped: 484
      HURT: 242
      helped stats (abs) min: 2 max: 76 x̄: 5.27 x̃: 6
      helped stats (rel) min: 0.01% max: 10.70% x̄: 0.18% x̃: 0.06%
      HURT stats (abs)   min: 2 max: 76 x̄: 4.07 x̃: 2
      HURT stats (rel)   min: 0.01% max: 3.99% x̄: 0.19% x̃: 0.03%
      95% mean confidence interval for cycles value: -2.76 -1.55
      95% mean confidence interval for cycles %-change: -0.12% 0.01%
      Inconclusive result (%-change mean confidence interval includes 0).
      
      GM45
      total instructions in shared programs: 4857870 -> 4857762 (<.01%)
      instructions in affected programs: 13994 -> 13886 (-0.77%)
      helped: 39
      HURT: 5
      helped stats (abs) min: 1 max: 13 x̄: 3.28 x̃: 2
      helped stats (rel) min: 0.33% max: 17.11% x̄: 1.86% x̃: 0.48%
      HURT stats (abs)   min: 1 max: 16 x̄: 4.00 x̃: 1
      HURT stats (rel)   min: 2.86% max: 4.71% x̄: 3.23% x̃: 2.86%
      95% mean confidence interval for instructions value: -3.86 -1.05
      95% mean confidence interval for instructions %-change: -2.61% 0.04%
      Inconclusive result (%-change mean confidence interval includes 0).
      
      total cycles in shared programs: 122180744 -> 122179674 (<.01%)
      cycles in affected programs: 3686646 -> 3685576 (-0.03%)
      helped: 273
      HURT: 141
      helped stats (abs) min: 2 max: 76 x̄: 5.81 x̃: 6
      helped stats (rel) min: 0.01% max: 10.70% x̄: 0.18% x̃: 0.06%
      HURT stats (abs)   min: 2 max: 76 x̄: 3.66 x̃: 2
      HURT stats (rel)   min: 0.01% max: 3.99% x̄: 0.16% x̃: 0.02%
      95% mean confidence interval for cycles value: -3.42 -1.75
      95% mean confidence interval for cycles %-change: -0.15% 0.03%
      Inconclusive result (%-change mean confidence interval includes 0).
      Signed-off-by: default avatarIan Romanick <ian.d.romanick@intel.com>
      Reviewed-by: Samuel Iglesias Gonsálvez's avatarSamuel Iglesias Gonsálvez <siglesias@igalia.com>
      Reviewed-by: default avatarElie Tournier <elie.tournier@collabora.com>
      053be9f0
    • Ian Romanick's avatar
      nir: Separate a weird compare with zero to two compares with zero · 821e7a4d
      Ian Romanick authored
      min(a+b, c+d) >= 0 becomes (a+b >= 0 && c+d >= 0).
      
      No shader-db changes, but it does prevent 6 to 12 instruction
      regressions in the next patch on all measured Intel platforms.
      Signed-off-by: default avatarIan Romanick <ian.d.romanick@intel.com>
      Reviewed-by: Samuel Iglesias Gonsálvez's avatarSamuel Iglesias Gonsálvez <siglesias@igalia.com>
      Reviewed-by: default avatarElie Tournier <elie.tournier@collabora.com>
      821e7a4d
    • Ian Romanick's avatar
      nir: Simplify min and max of b2f · 68420d83
      Ian Romanick authored
      v2: Rebase on almost 2 years.  Require that one of the arguments to fmin
      or fmax be used only once.  This prevents some regressions.
      
      shader-db results:
      
      Skylake and Broadwell had similar results.  Skylake shown.
      total instructions in shared programs: 14526021 -> 14525913 (<.01%)
      instructions in affected programs: 4613 -> 4505 (-2.34%)
      helped: 31
      HURT: 0
      helped stats (abs) min: 1 max: 4 x̄: 3.48 x̃: 4
      helped stats (rel) min: 0.62% max: 6.67% x̄: 3.31% x̃: 2.42%
      
      total cycles in shared programs: 533118710 -> 533118403 (<.01%)
      cycles in affected programs: 34334 -> 34027 (-0.89%)
      helped: 24
      HURT: 0
      helped stats (abs) min: 4 max: 24 x̄: 12.79 x̃: 14
      helped stats (rel) min: 0.25% max: 2.40% x̄: 1.08% x̃: 1.03%
      
      No changes on GM45, Iron Lake, Sandy Bridge, Ivy Bridge, or Haswell.
      Signed-off-by: default avatarIan Romanick <ian.d.romanick@intel.com>
      Reviewed-by: Samuel Iglesias Gonsálvez's avatarSamuel Iglesias Gonsálvez <siglesias@igalia.com>
      Reviewed-by: default avatarElie Tournier <elie.tournier@collabora.com>
      68420d83
    • Ian Romanick's avatar
      nir: Undo possible damage caused by rearranging or-compounded float compares · d8d18516
      Ian Romanick authored
      shader-db results:
      
      Skylake and Broadwell had similar results (Skylake shown)
      total instructions in shared programs: 14525898 -> 14525836 (<.01%)
      instructions in affected programs: 1964 -> 1902 (-3.16%)
      helped: 14
      HURT: 0
      helped stats (abs) min: 1 max: 25 x̄: 4.43 x̃: 1
      helped stats (rel) min: 0.68% max: 9.77% x̄: 2.10% x̃: 0.86%
      95% mean confidence interval for instructions value: -9.46 0.60
      95% mean confidence interval for instructions %-change: -3.97% -0.24%
      Inconclusive result (value mean confidence interval includes 0).
      
      total cycles in shared programs: 533119892 -> 533115756 (<.01%)
      cycles in affected programs: 96061 -> 91925 (-4.31%)
      helped: 13
      HURT: 1
      helped stats (abs) min: 60 max: 596 x̄: 318.77 x̃: 300
      helped stats (rel) min: 1.15% max: 5.49% x̄: 4.27% x̃: 4.42%
      HURT stats (abs)   min: 8 max: 8 x̄: 8.00 x̃: 8
      HURT stats (rel)   min: 0.46% max: 0.46% x̄: 0.46% x̃: 0.46%
      95% mean confidence interval for cycles value: -379.43 -211.43
      95% mean confidence interval for cycles %-change: -4.84% -3.01%
      Cycles are helped.
      
      Haswell, Ivy Bridge and Sandy Bridge had similar results (Haswell shown).
      total instructions in shared programs: 9033948 -> 9033898 (<.01%)
      instructions in affected programs: 535 -> 485 (-9.35%)
      helped: 2
      HURT: 0
      
      total cycles in shared programs: 84631402 -> 84628949 (<.01%)
      cycles in affected programs: 63197 -> 60744 (-3.88%)
      helped: 13
      HURT: 2
      helped stats (abs) min: 1 max: 594 x̄: 189.62 x̃: 140
      helped stats (rel) min: 0.07% max: 5.04% x̄: 3.79% x̃: 4.01%
      HURT stats (abs)   min: 4 max: 8 x̄: 6.00 x̃: 6
      HURT stats (rel)   min: 0.17% max: 0.45% x̄: 0.31% x̃: 0.31%
      95% mean confidence interval for cycles value: -253.40 -73.67
      95% mean confidence interval for cycles %-change: -4.24% -2.25%
      Cycles are helped.
      
      No changes on GM45 or Iron Lake.
      
      v2: Add a couple more tautological compares.  Suggested by Elie.
      Signed-off-by: default avatarIan Romanick <ian.d.romanick@intel.com>
      Reviewed-by: Samuel Iglesias Gonsálvez's avatarSamuel Iglesias Gonsálvez <siglesias@igalia.com>
      Reviewed-by: default avatarElie Tournier <elie.tournier@collabora.com>
      d8d18516
    • Ian Romanick's avatar
      nir: Be more conservative about rearranging or-compounded compares · 3941cba0
      Ian Romanick authored
      If both comparisons are used as sources for instructions other than the
      ior, this transformation is detrimental.  If the non-identical value in
      both compares is constant, the fmin or fmax will be constant-folded
      away, so the transformation is always a win.
      
      shader-db results:
      
      Skylake
      total instructions in shared programs: 14526147 -> 14525898 (<.01%)
      instructions in affected programs: 70239 -> 69990 (-0.35%)
      helped: 102
      HURT: 0
      helped stats (abs) min: 1 max: 8 x̄: 2.44 x̃: 1
      helped stats (rel) min: 0.07% max: 2.30% x̄: 0.38% x̃: 0.20%
      95% mean confidence interval for instructions value: -2.86 -2.02
      95% mean confidence interval for instructions %-change: -0.46% -0.31%
      Instructions are helped.
      
      total cycles in shared programs: 533120531 -> 533119892 (<.01%)
      cycles in affected programs: 994875 -> 994236 (-0.06%)
      helped: 76
      HURT: 26
      helped stats (abs) min: 1 max: 324 x̄: 27.09 x̃: 13
      helped stats (rel) min: <.01% max: 4.21% x̄: 0.45% x̃: 0.18%
      HURT stats (abs)   min: 1 max: 167 x̄: 54.62 x̃: 26
      HURT stats (rel)   min: <.01% max: 4.36% x̄: 1.01% x̃: 0.39%
      95% mean confidence interval for cycles value: -19.44 6.91
      95% mean confidence interval for cycles %-change: -0.30% 0.15%
      Inconclusive result (value mean confidence interval includes 0).
      
      Broadwell
      total instructions in shared programs: 14816005 -> 14815787 (<.01%)
      instructions in affected programs: 64658 -> 64440 (-0.34%)
      helped: 97
      HURT: 0
      helped stats (abs) min: 1 max: 8 x̄: 2.25 x̃: 1
      helped stats (rel) min: 0.07% max: 2.30% x̄: 0.38% x̃: 0.20%
      95% mean confidence interval for instructions value: -2.62 -1.87
      95% mean confidence interval for instructions %-change: -0.45% -0.30%
      Instructions are helped.
      
      total cycles in shared programs: 559340386 -> 559339907 (<.01%)
      cycles in affected programs: 1090491 -> 1090012 (-0.04%)
      helped: 66
      HURT: 28
      helped stats (abs) min: 2 max: 198 x̄: 23.83 x̃: 16
      helped stats (rel) min: 0.01% max: 4.21% x̄: 0.47% x̃: 0.27%
      HURT stats (abs)   min: 2 max: 226 x̄: 39.07 x̃: 11
      HURT stats (rel)   min: <.01% max: 4.61% x̄: 0.64% x̃: 0.20%
      95% mean confidence interval for cycles value: -15.94 5.75
      95% mean confidence interval for cycles %-change: -0.35% 0.07%
      Inconclusive result (value mean confidence interval includes 0).
      
      LOST:   0
      GAINED: 1
      
      Haswell
      total instructions in shared programs: 9034106 -> 9033948 (<.01%)
      instructions in affected programs: 24096 -> 23938 (-0.66%)
      helped: 38
      HURT: 0
      helped stats (abs) min: 1 max: 8 x̄: 4.16 x̃: 4
      helped stats (rel) min: 0.42% max: 2.29% x̄: 0.71% x̃: 0.64%
      95% mean confidence interval for instructions value: -4.71 -3.60
      95% mean confidence interval for instructions %-change: -0.84% -0.58%
      Instructions are helped.
      
      total cycles in shared programs: 84631628 -> 84631402 (<.01%)
      cycles in affected programs: 148674 -> 148448 (-0.15%)
      helped: 14
      HURT: 14
      helped stats (abs) min: 1 max: 114 x̄: 22.14 x̃: 12
      helped stats (rel) min: 0.02% max: 2.98% x̄: 0.66% x̃: 0.21%
      HURT stats (abs)   min: 1 max: 10 x̄: 6.00 x̃: 5
      HURT stats (rel)   min: 0.01% max: 0.20% x̄: 0.12% x̃: 0.11%
      95% mean confidence interval for cycles value: -19.42 3.28
      95% mean confidence interval for cycles %-change: -0.59% 0.05%
      Inconclusive result (value mean confidence interval includes 0).
      
      Ivy Bridge
      total instructions in shared programs: 10015456 -> 10015293 (<.01%)
      instructions in affected programs: 27701 -> 27538 (-0.59%)
      helped: 38
      HURT: 0
      helped stats (abs) min: 1 max: 9 x̄: 4.29 x̃: 4
      helped stats (rel) min: 0.33% max: 2.79% x̄: 0.66% x̃: 0.52%
      95% mean confidence interval for instructions value: -4.87 -3.71
      95% mean confidence interval for instructions %-change: -0.82% -0.51%
      Instructions are helped.
      
      total cycles in shared programs: 87524771 -> 87524569 (<.01%)
      cycles in affected programs: 112324 -> 112122 (-0.18%)
      helped: 6
      HURT: 12
      helped stats (abs) min: 2 max: 111 x̄: 44.67 x̃: 20
      helped stats (rel) min: 0.02% max: 2.94% x̄: 1.45% x̃: 1.26%
      HURT stats (abs)   min: 1 max: 16 x̄: 5.50 x̃: 5
      HURT stats (rel)   min: <.01% max: 0.16% x̄: 0.08% x̃: 0.08%
      95% mean confidence interval for cycles value: -29.14 6.69
      95% mean confidence interval for cycles %-change: -0.93% 0.08%
      Inconclusive result (value mean confidence interval includes 0).
      
      LOST:   0
      GAINED: 2
      
      Sandy Bridge
      total instructions in shared programs: 10545655 -> 10545465 (<.01%)
      instructions in affected programs: 37198 -> 37008 (-0.51%)
      helped: 42
      HURT: 0
      helped stats (abs) min: 1 max: 8 x̄: 4.52 x̃: 4
      helped stats (rel) min: 0.31% max: 2.15% x̄: 0.58% x̃: 0.49%
      95% mean confidence interval for instructions value: -5.14 -3.91
      95% mean confidence interval for instructions %-change: -0.68% -0.47%
      Instructions are helped.
      
      total cycles in shared programs: 146113059 -> 146112427 (<.01%)
      cycles in affected programs: 423514 -> 422882 (-0.15%)
      helped: 32
      HURT: 10
      helped stats (abs) min: 4 max: 162 x̄: 24.34 x̃: 12
      helped stats (rel) min: 0.06% max: 2.74% x̄: 0.37% x̃: 0.11%
      HURT stats (abs)   min: 12 max: 19 x̄: 14.70 x̃: 14
      HURT stats (rel)   min: 0.10% max: 0.18% x̄: 0.16% x̃: 0.14%
      95% mean confidence interval for cycles value: -26.03 -4.07
      95% mean confidence interval for cycles %-change: -0.43% -0.05%
      Cycles are helped.
      
      Iron Lake
      total instructions in shared programs: 7886959 -> 7886925 (<.01%)
      instructions in affected programs: 1340 -> 1306 (-2.54%)
      helped: 4
      HURT: 0
      helped stats (abs) min: 2 max: 15 x̄: 8.50 x̃: 8
      helped stats (rel) min: 0.63% max: 4.30% x̄: 2.45% x̃: 2.43%
      95% mean confidence interval for instructions value: -20.44 3.44
      95% mean confidence interval for instructions %-change: -5.78% 0.89%
      Inconclusive result (value mean confidence interval includes 0).
      
      total cycles in shared programs: 178116996 -> 178116888 (<.01%)
      cycles in affected programs: 6262 -> 6154 (-1.72%)
      helped: 2
      HURT: 2
      helped stats (abs) min: 44 max: 78 x̄: 61.00 x̃: 61
      helped stats (rel) min: 3.31% max: 3.94% x̄: 3.62% x̃: 3.62%
      HURT stats (abs)   min: 6 max: 8 x̄: 7.00 x̃: 7
      HURT stats (rel)   min: 0.34% max: 0.68% x̄: 0.51% x̃: 0.51%
      95% mean confidence interval for cycles value: -93.27 39.27
      95% mean confidence interval for cycles %-change: -5.38% 2.27%
      Inconclusive result (value mean confidence interval includes 0).
      
      GM45
      total instructions in shared programs: 4857887 -> 4857870 (<.01%)
      instructions in affected programs: 674 -> 657 (-2.52%)
      helped: 2
      HURT: 0
      
      total cycles in shared programs: 122180816 -> 122180744 (<.01%)
      cycles in affected programs: 3764 -> 3692 (-1.91%)
      helped: 1
      HURT: 1
      helped stats (abs) min: 78 max: 78 x̄: 78.00 x̃: 78
      helped stats (rel) min: 3.94% max: 3.94% x̄: 3.94% x̃: 3.94%
      HURT stats (abs)   min: 6 max: 6 x̄: 6.00 x̃: 6
      HURT stats (rel)   min: 0.34% max: 0.34% x̄: 0.34% x̃: 0.34%
      Signed-off-by: default avatarIan Romanick <ian.d.romanick@intel.com>
      Reviewed-by: Samuel Iglesias Gonsálvez's avatarSamuel Iglesias Gonsálvez <siglesias@igalia.com>
      Reviewed-by: default avatarElie Tournier <elie.tournier@collabora.com>
      3941cba0
    • Ian Romanick's avatar
      nir: See through an fneg to apply existing optimizations · cfc0d348
      Ian Romanick authored
      Doing the same for the existing feq and fne transformations didn't help
      anything in shader-db.
      
      shader-db results:
      
      Broadwell and Skylake (Skylake shown)
      total instructions in shared programs: 14529463 -> 14526147 (-0.02%)
      instructions in affected programs: 402420 -> 399104 (-0.82%)
      helped: 2136
      HURT: 131
      helped stats (abs) min: 1 max: 10 x̄: 1.61 x̃: 1
      helped stats (rel) min: 0.03% max: 16.22% x̄: 3.14% x̃: 1.12%
      HURT stats (abs)   min: 1 max: 2 x̄: 1.01 x̃: 1
      HURT stats (rel)   min: 0.13% max: 7.69% x̄: 0.75% x̃: 0.57%
      95% mean confidence interval for instructions value: -1.51 -1.41
      95% mean confidence interval for instructions %-change: -3.06% -2.78%
      Instructions are helped.
      
      total cycles in shared programs: 533146915 -> 533120531 (<.01%)
      cycles in affected programs: 10356261 -> 10329877 (-0.25%)
      helped: 1933
      HURT: 844
      helped stats (abs) min: 1 max: 490 x̄: 29.44 x̃: 16
      helped stats (rel) min: <.01% max: 28.57% x̄: 3.43% x̃: 1.88%
      HURT stats (abs)   min: 1 max: 423 x̄: 36.17 x̃: 12
      HURT stats (rel)   min: <.01% max: 23.75% x̄: 1.90% x̃: 0.59%
      95% mean confidence interval for cycles value: -11.78 -7.22
      95% mean confidence interval for cycles %-change: -1.98% -1.65%
      Cycles are helped.
      
      Haswell
      total instructions in shared programs: 9037416 -> 9034106 (-0.04%)
      instructions in affected programs: 389831 -> 386521 (-0.85%)
      helped: 2184
      HURT: 120
      helped stats (abs) min: 1 max: 11 x̄: 1.57 x̃: 1
      helped stats (rel) min: 0.03% max: 25.00% x̄: 2.73% x̃: 1.02%
      HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
      HURT stats (rel)   min: 0.19% max: 7.69% x̄: 0.81% x̃: 0.57%
      95% mean confidence interval for instructions value: -1.49 -1.39
      95% mean confidence interval for instructions %-change: -2.68% -2.41%
      Instructions are helped.
      
      total cycles in shared programs: 84636243 -> 84631628 (<.01%)
      cycles in affected programs: 4745058 -> 4740443 (-0.10%)
      helped: 1904
      HURT: 960
      helped stats (abs) min: 1 max: 466 x̄: 30.21 x̃: 18
      helped stats (rel) min: 0.02% max: 36.36% x̄: 3.57% x̃: 2.38%
      HURT stats (abs)   min: 1 max: 1080 x̄: 55.11 x̃: 14
      HURT stats (rel)   min: 0.02% max: 51.33% x̄: 2.77% x̃: 0.81%
      95% mean confidence interval for cycles value: -4.51 1.29
      95% mean confidence interval for cycles %-change: -1.64% -1.25%
      Inconclusive result (value mean confidence interval includes 0).
      
      LOST:   1
      GAINED: 0
      
      Sandy Bridge and Ivy Bridge (Ivy Bridge shown)
      total instructions in shared programs: 10018873 -> 10015456 (-0.03%)
      instructions in affected programs: 512820 -> 509403 (-0.67%)
      helped: 2268
      HURT: 162
      helped stats (abs) min: 1 max: 11 x̄: 1.62 x̃: 1
      helped stats (rel) min: 0.03% max: 25.00% x̄: 2.47% x̃: 0.88%
      HURT stats (abs)   min: 1 max: 4 x̄: 1.59 x̃: 1
      HURT stats (rel)   min: 0.09% max: 7.69% x̄: 0.86% x̃: 0.50%
      95% mean confidence interval for instructions value: -1.46 -1.35
      95% mean confidence interval for instructions %-change: -2.38% -2.12%
      Instructions are helped.
      
      total cycles in shared programs: 87538223 -> 87524771 (-0.02%)
      cycles in affected programs: 5435520 -> 5422068 (-0.25%)
      helped: 1916
      HURT: 946
      helped stats (abs) min: 1 max: 1392 x̄: 29.44 x̃: 18
      helped stats (rel) min: <.01% max: 34.51% x̄: 3.34% x̃: 1.97%
      HURT stats (abs)   min: 1 max: 633 x̄: 45.41 x̃: 11
      HURT stats (rel)   min: 0.02% max: 25.95% x̄: 2.41% x̃: 0.62%
      95% mean confidence interval for cycles value: -7.34 -2.06
      95% mean confidence interval for cycles %-change: -1.62% -1.26%
      Cycles are helped.
      
      LOST:   1
      GAINED: 0
      
      Iron Lake
      total instructions in shared programs: 7888446 -> 7886959 (-0.02%)
      instructions in affected programs: 331581 -> 330094 (-0.45%)
      helped: 1160
      HURT: 97
      helped stats (abs) min: 1 max: 10 x̄: 1.37 x̃: 1
      helped stats (rel) min: 0.02% max: 9.68% x̄: 0.93% x̃: 0.43%
      HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
      HURT stats (rel)   min: 0.17% max: 4.17% x̄: 0.37% x̃: 0.25%
      95% mean confidence interval for instructions value: -1.25 -1.12
      95% mean confidence interval for instructions %-change: -0.91% -0.75%
      Instructions are helped.
      
      total cycles in shared programs: 178130766 -> 178116996 (<.01%)
      cycles in affected programs: 12534564 -> 12520794 (-0.11%)
      helped: 1856
      HURT: 187
      helped stats (abs) min: 2 max: 202 x̄: 7.78 x̃: 4
      helped stats (rel) min: <.01% max: 6.47% x̄: 0.28% x̃: 0.11%
      HURT stats (abs)   min: 2 max: 26 x̄: 3.55 x̃: 2
      HURT stats (rel)   min: 0.01% max: 2.14% x̄: 0.08% x̃: 0.02%
      95% mean confidence interval for cycles value: -7.41 -6.07
      95% mean confidence interval for cycles %-change: -0.28% -0.22%
      Cycles are helped.
      
      GM45
      total instructions in shared programs: 4858912 -> 4857887 (-0.02%)
      instructions in affected programs: 237565 -> 236540 (-0.43%)
      helped: 867
      HURT: 57
      helped stats (abs) min: 1 max: 10 x̄: 1.25 x̃: 1
      helped stats (rel) min: 0.02% max: 9.38% x̄: 0.87% x̃: 0.43%
      HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
      HURT stats (rel)   min: 0.16% max: 3.85% x̄: 0.34% x̃: 0.22%
      95% mean confidence interval for instructions value: -1.18 -1.04
      95% mean confidence interval for instructions %-change: -0.88% -0.71%
      Instructions are helped.
      
      total cycles in shared programs: 122189118 -> 122180816 (<.01%)
      cycles in affected programs: 8776418 -> 8768116 (-0.09%)
      helped: 1213
      HURT: 166
      helped stats (abs) min: 2 max: 202 x̄: 7.30 x̃: 4
      helped stats (rel) min: <.01% max: 6.43% x̄: 0.25% x̃: 0.11%
      HURT stats (abs)   min: 2 max: 26 x̄: 3.35 x̃: 2
      HURT stats (rel)   min: 0.01% max: 2.14% x̄: 0.06% x̃: 0.02%
      95% mean confidence interval for cycles value: -6.78 -5.26
      95% mean confidence interval for cycles %-change: -0.24% -0.18%
      Cycles are helped.
      Signed-off-by: default avatarIan Romanick <ian.d.romanick@intel.com>
      Reviewed-by: Samuel Iglesias Gonsálvez's avatarSamuel Iglesias Gonsálvez <siglesias@igalia.com>
      Reviewed-by: default avatarElie Tournier <elie.tournier@collabora.com>
      cfc0d348
  7. 01 Aug, 2017 1 commit
  8. 20 Jul, 2017 1 commit
  9. 24 Apr, 2017 3 commits
    • Timothy Arceri's avatar
      nir/i965: add before ffma algebraic opts · 7a7ee40c
      Timothy Arceri authored
      This shuffles constants down in the reverse of what the previous
      patch does and applies some simpilifications that may be made
      possible from doing so.
      
      Shader-db results BDW:
      
      total instructions in shared programs: 12980814 -> 12977822 (-0.02%)
      instructions in affected programs: 281889 -> 278897 (-1.06%)
      helped: 1231
      HURT: 128
      
      total cycles in shared programs: 246562852 -> 246567288 (0.00%)
      cycles in affected programs: 11271524 -> 11275960 (0.04%)
      helped: 1630
      HURT: 1378
      
      V2: mark float opts as inexact
      Reviewed-by: default avatarElie Tournier <elie.tournier@collabora.com>
      Reviewed-by: Kenneth Graunke's avatarKenneth Graunke <kenneth@whitecape.org>
      7a7ee40c
    • Timothy Arceri's avatar
      nir: shuffle constants to the top · fb2269fe
      Timothy Arceri authored
      V2: mark float opts as inexact
      
      If one of the inputs to an mul/add is the result of another
      mul/add there is a chance that we can reuse the result of that
      mul/add in other calls if we do the multiplication in the right
      order.
      
      Also by attempting to move all constants to the top we increase
      the chance of constant folding.
      
      For example it is a fairly common pattern for shaders to do something
      similar to this:
      
        const float a = 0.5;
        in vec4 b;
        in float c;
      
        ...
      
        b.x = b.x * c;
        b.y = b.y * c;
      
        ...
      
        b.x = b.x * a + a;
        b.y = b.y * a + a;
      
      So by simply detecting that constant a is part of the multiplication
      in ffma and switching it with previous fmul that updates b we end up
      with:
      
        ...
      
        c = a * c;
      
        ...
      
        b.x = b.x * c + a;
        b.y = b.y * c + a;
      
      Shader-db results BDW:
      
      total instructions in shared programs: 13011050 -> 12967888 (-0.33%)
      instructions in affected programs: 4118366 -> 4075204 (-1.05%)
      helped: 17739
      HURT: 1343
      
      total cycles in shared programs: 246717952 -> 246410716 (-0.12%)
      cycles in affected programs: 166870802 -> 166563566 (-0.18%)
      helped: 18493
      HURT: 7965
      
      total spills in shared programs: 14937 -> 14560 (-2.52%)
      spills in affected programs: 9331 -> 8954 (-4.04%)
      helped: 284
      HURT: 33
      
      total fills in shared programs: 20211 -> 19671 (-2.67%)
      fills in affected programs: 12586 -> 12046 (-4.29%)
      helped: 286
      HURT: 33
      
      LOST:   39
      GAINED: 33
      
      Some of the hurt will go away when we shuffle things back down to the
      bottom in the following patch. It's also noteworthy that almost all of the
      spill changes are in Deus Ex both hurt and helped.
      Reviewed-by: default avatarElie Tournier <elie.tournier@collabora.com>
      Reviewed-by: Kenneth Graunke's avatarKenneth Graunke <kenneth@whitecape.org>
      fb2269fe
    • Timothy Arceri's avatar
      nir: add flt comparision simplification · 83f7fdf8
      Timothy Arceri authored
      Didn't turn out as useful as I'd hoped, but it will help alot more on
      i965 by reducing regressions when we drop brw_do_channel_expressions()
      and brw_do_vector_splitting().
      
      I'm not sure how much sense 'is_not_used_by_conditional' makes on
      platforms other than i965 but since this is a new opt it at least
      won't do any harm.
      
      shader-db BDW:
      
      total instructions in shared programs: 13029581 -> 13029415 (-0.00%)
      instructions in affected programs: 15268 -> 15102 (-1.09%)
      helped: 86
      HURT: 0
      
      total cycles in shared programs: 247038346 -> 247036198 (-0.00%)
      cycles in affected programs: 692634 -> 690486 (-0.31%)
      helped: 183
      HURT: 27
      Reviewed-by: default avatarElie Tournier <elie.tournier@collabora.com>
      Reviewed-by: Kenneth Graunke's avatarKenneth Graunke <kenneth@whitecape.org>
      83f7fdf8
  10. 14 Mar, 2017 1 commit
    • Jason Ekstrand's avatar
      nir: Rework conversion opcodes · 762a6333
      Jason Ekstrand authored
      The NIR story on conversion opcodes is a mess.  We've had way too many
      of them, naming is inconsistent, and which ones have explicit sizes was
      sort-of random.  This commit re-organizes things and makes them all
      consistent:
      
       - All non-bool conversion opcodes now have the explicit size in the
         destination and are named <src_type>2<dst_type><size>.
      
       - Integer <-> integer conversion opcodes now only come in i2i and u2u
         forms (i2u and u2i have been removed) since the only difference
         between the different integer conversions is whether or not they
         sign-extend when up-converting.
      
       - Boolean conversion opcodes all have the explicit size on the bool and
         are named <src_type>2<dst_type>.
      
      Making things consistent also allows nir_type_conversion_op to be moved
      to nir_opcodes.c and auto-generated using mako.  This will make adding
      int8, int16, and float16 versions much easier when the time comes.
      Reviewed-by: Eric Anholt's avatarEric Anholt <eric@anholt.net>
      762a6333
  11. 10 Mar, 2017 1 commit
  12. 17 Feb, 2017 2 commits
  13. 20 Jan, 2017 2 commits
    • Ian Romanick's avatar
      nir: Shift count for shift opcodes is always 32-bits · fda33e09
      Ian Romanick authored
      Previously both sources were unsized.  This caused problems when the
      thing being shifted was 64-bit but the shift count was 32-bit.  The
      expectation in NIR is that all unsized sources (and destination) will
      ultimately have the same size.
      
      The changes in nir_opt_algebraic.py are to prevent errors like:
      
       Failed to parse transformation:
      03:12:25   (('extract_i8', 'a', 'b'), ('ishr', ('ishl', 'a', ('imul', ('isub', 3, 'b'), 8)), 24), 'options->lower_extract_byte')
      03:12:25 Traceback (most recent call last):
      03:12:25   File "/home/jenkins/workspace/Leeroy_2/repos/mesa/src/compiler/nir/nir_algebraic.py", line 610, in __init__
      03:12:25     xform = SearchAndReplace(xform)
      03:12:25   File "/home/jenkins/workspace/Leeroy_2/repos/mesa/src/compiler/nir/nir_algebraic.py", line 495, in __init__
      03:12:25     BitSizeValidator(varset).validate(self.search, self.replace)
      03:12:25   File "/home/jenkins/workspace/Leeroy_2/repos/mesa/src/compiler/nir/nir_algebraic.py", line 311, in validate
      03:12:25     validate_dst_class = self._validate_bit_class_up(replace)
      03:12:25   File "/home/jenkins/workspace/Leeroy_2/repos/mesa/src/compiler/nir/nir_algebraic.py", line 414, in _validate_bit_class_up
      03:12:25     src_class = self._validate_bit_class_up(val.sources[i])
      03:12:25   File "/home/jenkins/workspace/Leeroy_2/repos/mesa/src/compiler/nir/nir_algebraic.py", line 420, in _validate_bit_class_up
      03:12:25     assert src_class == src_type_bits
      03:12:25 AssertionError
      Signed-off-by: default avatarIan Romanick <ian.d.romanick@intel.com>
      Suggested-by: Connor Abbott's avatarConnor Abbott <cwabbott0@gmail.com>
      Reviewed-by: Connor Abbott's avatarConnor Abbott <cwabbott0@gmail.com>
      Cc: Jason Ekstrand <jason@jlekstrand.net>
      fda33e09
    • Elie Tournier's avatar
      nir: add min/max optimisation · 9fdaeb77
      Elie Tournier authored
      Add the following optimisations:
      
      min(x, -x) = -abs(x)
      min(x, -abs(x)) = -abs(x)
      min(x, abs(x)) = x
      max(x, -abs(x)) = x
      max(x, abs(x)) = abs(x)
      max(x, -x) = abs(x)
      
      shader-db:
      
      total instructions in shared programs: 13067779 -> 13067775 (-0.00%)
      instructions in affected programs: 249 -> 245 (-1.61%)
      helped: 4
      HURT: 0
      
      total cycles in shared programs: 252054838 -> 252054806 (-0.00%)
      cycles in affected programs: 504 -> 472 (-6.35%)
      helped: 2
      HURT: 0
      Signed-off-by: Elie Tournier's avatarElie Tournier <tournier.elie@gmail.com>
      Reviewed-by: Plamena Manolova's avatarPlamena Manolova <plamena.manolova@intel.com>
      Reviewed-by: Jason Ekstrand's avatarJason Ekstrand <jason@jlekstrand.net>
      9fdaeb77
  14. 14 Jan, 2017 1 commit
    • Timothy Arceri's avatar
      nir: optimise min/max fadd combos · 772cd310
      Timothy Arceri authored
      shader-db results BDW:
      
      total instructions in shared programs: 13060410 -> 13060313 (-0.00%)
      instructions in affected programs: 24533 -> 24436 (-0.40%)
      helped: 88
      HURT: 0
      
      total cycles in shared programs: 256585692 -> 256586698 (0.00%)
      cycles in affected programs: 647290 -> 648296 (0.16%)
      helped: 35
      HURT: 30
      Reviewed-by: Matt Turner's avatarMatt Turner <mattst88@gmail.com>
      772cd310
  15. 11 Jan, 2017 3 commits
    • Timothy Arceri's avatar
      nir: don't turn ieq/ine into inot if used by an if · de8b03f5
      Timothy Arceri authored
      Otherwise we will end up with an extra instruction to compare the
      result of the inot.
      
      On BDW:
      
      total instructions in shared programs: 13060620 -> 13060481 (-0.00%)
      instructions in affected programs: 103379 -> 103240 (-0.13%)
      helped: 127
      HURT: 0
      
      total cycles in shared programs: 256590950 -> 256587408 (-0.00%)
      cycles in affected programs: 11324730 -> 11321188 (-0.03%)
      helped: 114
      HURT: 21
      Reviewed-by: Jason Ekstrand's avatarJason Ekstrand <jason@jlekstrand.net>
      de8b03f5
    • Timothy Arceri's avatar
      nir: add late opt to turn inot/b2f combos back to bcsel · 7acc8652
      Timothy Arceri authored
      We turn these from bcsel into inot/b2f combos in order for other
      optimisation passes to get further. Once we have finished turn
      the ones that remain and are used in more than a single expression
      back into a bcsel.
      
      On BDW:
      
      total instructions in shared programs: 13060965 -> 13060297 (-0.01%)
      instructions in affected programs: 835701 -> 835033 (-0.08%)
      helped: 670
      HURT: 2
      
      total cycles in shared programs: 256599536 -> 256598006 (-0.00%)
      cycles in affected programs: 114655488 -> 114653958 (-0.00%)
      helped: 419
      HURT: 240
      
      LOST:   0
      GAINED: 1
      
      The 2 HURT is because inserting bcsel creates the only use of
      const 1.0 in two shaders from tri-of-friendship-and-madness.
      Reviewed-by: Jason Ekstrand's avatarJason Ekstrand <jason@jlekstrand.net>
      7acc8652
    • Timothy Arceri's avatar
      nir: add imprecise flrp optimisation · 8f37fc70
      Timothy Arceri authored
      On BDW:
      
      total instructions in shared programs: 13061890 -> 13061877 (-0.00%)
      instructions in affected programs: 2441 -> 2428 (-0.53%)
      helped: 13
      HURT: 0
      
      total cycles in shared programs: 256612254 -> 256611784 (-0.00%)
      cycles in affected programs: 16418 -> 15948 (-2.86%)
      helped: 10
      HURT: 2
      
      V2: don't use ffma directly
      Reviewed-by: Jason Ekstrand's avatarJason Ekstrand <jason@jlekstrand.net>
      8f37fc70
  16. 09 Jan, 2017 3 commits
  17. 23 Dec, 2016 1 commit