intel/fs: sel.cond writes the flags on Gfx4 and Gfx5
On Gfx4 and Gfx5, sel.l (for min) and sel.ge (for max) are implemented using a separte cmpn and sel instruction. This lowering occurs in fs_vistor::lower_minmax which is called very, very late... a long, long time after the first calls to opt_cmod_propagation. As a result, conditional modifiers can be incorrectly propagated across sel.cond on those platforms. No tests were affected by this change, and I find that quite shocking. After just changing flags_written(), all of the atan tests started failing on ILK. That required the change in cmod_propagatin (and the addition of the prop_across_into_sel_gfx5 unit test). Shader-db results for ILK and GM45 are below. I looked at a couple before and after shaders... and every case that I looked at had experienced incorrect cmod propagation. This affected a LOT of apps! Euro Truck Simulator 2, The Talos Principle, Serious Sam 3, Sanctum 2, Gang Beasts, and on and on... :( I discovered this bug while working on a couple new optimization passes. One of the passes attempts to remove condition modifiers that are never used. The pass made no progress except on ILK and GM45. After investigating a couple of the affected shaders, I noticed that the code in those shaders looked wrong... investigation led to this cause. v2: Trivial changes in the unit tests. v3: Fix type in comment in unit tests. Noticed by Jason and Priit. v4: Tweak handling of BRW_OPCODE_SEL special case. Suggested by Jason. Fixes: df1aec76 ("i965/fs: Define methods to calculate the flag subset read or written by an fs_inst.") Reviewed-by:Jason Ekstrand <jason@jlekstrand.net> Tested-by:
Dave Airlie <airlied@redhat.com> Iron Lake total instructions in shared programs: 8180493 -> 8181781 (0.02%) instructions in affected programs: 541796 -> 543084 (0.24%) helped: 28 HURT: 1158 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 0.35% max: 0.86% x̄: 0.53% x̃: 0.50% HURT stats (abs) min: 1 max: 3 x̄: 1.14 x̃: 1 HURT stats (rel) min: 0.12% max: 4.00% x̄: 0.37% x̃: 0.23% 95% mean confidence interval for instructions value: 1.06 1.11 95% mean confidence interval for instructions %-change: 0.31% 0.38% Instructions are HURT. total cycles in shared programs: 239420470 -> 239421690 (<.01%) cycles in affected programs: 2925992 -> 2927212 (0.04%) helped: 49 HURT: 157 helped stats (abs) min: 2 max: 284 x̄: 62.69 x̃: 70 helped stats (rel) min: 0.04% max: 6.20% x̄: 1.68% x̃: 1.96% HURT stats (abs) min: 2 max: 48 x̄: 27.34 x̃: 24 HURT stats (rel) min: 0.02% max: 2.91% x̄: 0.31% x̃: 0.20% 95% mean confidence interval for cycles value: -0.80 12.64 95% mean confidence interval for cycles %-change: -0.31% <.01% Inconclusive result (value mean confidence interval includes 0). GM45 total instructions in shared programs: 4985517 -> 4986207 (0.01%) instructions in affected programs: 306935 -> 307625 (0.22%) helped: 14 HURT: 625 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 0.35% max: 0.82% x̄: 0.52% x̃: 0.49% HURT stats (abs) min: 1 max: 3 x̄: 1.13 x̃: 1 HURT stats (rel) min: 0.12% max: 3.90% x̄: 0.34% x̃: 0.22% 95% mean confidence interval for instructions value: 1.04 1.12 95% mean confidence interval for instructions %-change: 0.29% 0.36% Instructions are HURT. total cycles in shared programs: 153827268 -> 153828052 (<.01%) cycles in affected programs: 1669290 -> 1670074 (0.05%) helped: 24 HURT: 84 helped stats (abs) min: 2 max: 232 x̄: 64.33 x̃: 67 helped stats (rel) min: 0.04% max: 4.62% x̄: 1.60% x̃: 1.94% HURT stats (abs) min: 2 max: 48 x̄: 27.71 x̃: 24 HURT stats (rel) min: 0.02% max: 2.66% x̄: 0.34% x̃: 0.14% 95% mean confidence interval for cycles value: -1.94 16.46 95% mean confidence interval for cycles %-change: -0.29% 0.11% Inconclusive result (value mean confidence interval includes 0). Part-of: <mesa/mesa!12191>
- src/intel/compiler/brw_fs.cpp 8 additions, 4 deletionssrc/intel/compiler/brw_fs.cpp
- src/intel/compiler/brw_fs_cmod_propagation.cpp 25 additions, 15 deletionssrc/intel/compiler/brw_fs_cmod_propagation.cpp
- src/intel/compiler/brw_fs_cse.cpp 2 additions, 2 deletionssrc/intel/compiler/brw_fs_cse.cpp
- src/intel/compiler/brw_fs_dead_code_eliminate.cpp 6 additions, 5 deletionssrc/intel/compiler/brw_fs_dead_code_eliminate.cpp
- src/intel/compiler/brw_fs_live_variables.cpp 1 addition, 1 deletionsrc/intel/compiler/brw_fs_live_variables.cpp
- src/intel/compiler/brw_fs_lower_regioning.cpp 1 addition, 1 deletionsrc/intel/compiler/brw_fs_lower_regioning.cpp
- src/intel/compiler/brw_fs_sel_peephole.cpp 5 additions, 4 deletionssrc/intel/compiler/brw_fs_sel_peephole.cpp
- src/intel/compiler/brw_ir_fs.h 1 addition, 1 deletionsrc/intel/compiler/brw_ir_fs.h
- src/intel/compiler/brw_ir_performance.cpp 2 additions, 2 deletionssrc/intel/compiler/brw_ir_performance.cpp
- src/intel/compiler/brw_schedule_instructions.cpp 2 additions, 2 deletionssrc/intel/compiler/brw_schedule_instructions.cpp
- src/intel/compiler/test_fs_cmod_propagation.cpp 131 additions, 0 deletionssrc/intel/compiler/test_fs_cmod_propagation.cpp