Commit f93d5457 authored by Jason Ekstrand's avatar Jason Ekstrand
Browse files

nir/opt_if: Be more aggressive in opt_split_alu_of_phi

The opt_split_alu_of_phi pass in nir_opt_if was originally designed to
handle cases where the continue portion of the loop was eaten up by
nir_opt_peephole_select before opt_peel_loop_initial_if got to it.
However, it only peels instructions around if their sources either come
from the tail of the loop or are immediates or undefs.  This is a good
way to avoid register pressure regressions but it also means that it
can't peel them off in many cases.  This commit removes those
restrictions and makes the pass more aggressive.

By itself, this change only ever adds instructions since it duplicates
every instruction it peels.  However, those instructions aren't ever
executed any more times and it does sometimes allow other optimization
passes such as opt_simplify_bcsel_of_phi to make progress.  The one
downside, as mentioned above, is that it can increase register pressure
in some cases (I'm still not 100% sure how) and resulted in a few lost
SIMD16 programs.

Shader-db results on ICL:

    total instructions in shared programs: 17129594 -> 17130180 (<.01%)
    instructions in affected programs: 32537 -> 33123 (1.80%)
    helped: 0
    HURT: 82
    HURT stats (abs)   min: 7 max: 10 x̄: 7.15 x̃: 7
    HURT stats (rel)   min: 1.21% max: 3.76% x̄: 1.91% x̃: 1.77%
    95% mean confidence interval for instructions value: 7.00 7.29
    95% mean confidence interval for instructions %-change: 1.79% 2.02%
    Instructions are HURT.

    total cycles in shared programs: 364257702 -> 364316288 (0.02%)
    cycles in affected programs: 627727 -> 686313 (9.33%)
    helped: 0
    HURT: 82
    HURT stats (abs)   min: 36 max: 10956 x̄: 714.46 x̃: 258
    HURT stats (rel)   min: 0.78% max: 16.03% x̄: 6.05% x̃: 5.32%
    95% mean confidence interval for cycles value: 270.02 1158.91
    95% mean confidence interval for cycles %-change: 5.39% 6.70%
    Cycles are HURT.

    LOST:   8
    GAINED: 0

The lost programs are all in Talos Principal (which doesn't even have an
OpenGL back-end anymore) and Serious Sam 3.

Fossil-db results on ICL:

    Instructions in all programs: 263442482 -> 263438116 (-0.0%)
    SENDs in all programs: 15062268 -> 15062268 (+0.0%)
    Loops in all programs: 150022 -> 150022 (+0.0%)
    Cycles in all programs: 84967273066 -> 84728637038 (-0.3%)
    Spills in all programs: 243458 -> 243414 (-0.0%)
    Fills in all programs: 341009 -> 340983 (-0.0%)

Virtually all the change there was in GeekBench and it was all help
except for two shaders hurt by a few cycles.  There were also four
shaders in Shadow of the Tomb Rader which were hurt by < 1% instructions
but helped by 2% cycles on average.
parent f13049f4
Pipeline #136918 waiting for manual action with stages
......@@ -309,8 +309,6 @@ alu_instr_is_type_conversion(const nir_alu_instr *alu)
*
* - At least one source of the instruction is a phi node from the header block.
*
* - The phi node selects a constant or undef from the block before the loop.
*
* - Any non-phi sources of the ALU instruction come from a block that
* dominates the block before the loop. The most common failure mode for
* this check is sources that are generated in the loop header block.
......@@ -395,10 +393,8 @@ opt_split_alu_of_phi(nir_builder *b, nir_loop *loop)
bool has_phi_src_from_prev_block = false;
bool all_non_phi_exist_in_prev_block = true;
bool is_prev_result_undef = true;
bool is_prev_result_const = true;
nir_ssa_def *prev_srcs[8]; // FINISHME: Array size?
nir_ssa_def *continue_srcs[8]; // FINISHME: Array size?
nir_ssa_def *prev_srcs[NIR_MAX_VEC_COMPONENTS];
nir_ssa_def *continue_srcs[NIR_MAX_VEC_COMPONENTS];
for (unsigned i = 0; i < nir_op_infos[alu->op].num_inputs; i++) {
nir_instr *const src_instr = alu->src[i].src.ssa->parent_instr;
......@@ -421,16 +417,6 @@ opt_split_alu_of_phi(nir_builder *b, nir_loop *loop)
nir_foreach_phi_src(src_of_phi, phi) {
if (src_of_phi->pred == prev_block) {
if (src_of_phi->src.ssa->parent_instr->type !=
nir_instr_type_ssa_undef) {
is_prev_result_undef = false;
}
if (src_of_phi->src.ssa->parent_instr->type !=
nir_instr_type_load_const) {
is_prev_result_const = false;
}
prev_srcs[i] = src_of_phi->src.ssa;
has_phi_src_from_prev_block = true;
} else
......@@ -453,8 +439,7 @@ opt_split_alu_of_phi(nir_builder *b, nir_loop *loop)
}
}
if (has_phi_src_from_prev_block && all_non_phi_exist_in_prev_block &&
(is_prev_result_undef || is_prev_result_const)) {
if (has_phi_src_from_prev_block && all_non_phi_exist_in_prev_block) {
nir_block *const continue_block = find_continue_block(loop);
b->cursor = nir_after_block(prev_block);
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment