intel/brw: scoreboarding regression
Commit ff89e831
Author: Caio Oliveira <caio.oliveira@intel.com>
Date: Thu Apr 4 16:03:34 2024 -0700
intel/brw: Lower VGRFs to FIXED_GRFs earlier
Moves the lowering of VGRFs into FIXED_GRFs from the code generation
to (almost) right after the register allocation.
This will allow: (1) later passes not worry about VGRFs (and what they
mean in a post reg alloc phase) and (2) make easier to add certain
types of validation post reg alloc phase using the backend IR.
Note that a couple of passes still take advantage of seeing "allocated
VGRFs", so perform lowering after they run.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28604>
is triggering a scoreboarding regression on a branch I'm working on.
Since my modifications are all before the optimization passes & register allocation, it looks like the problem was introduced by the commit above.
I can repro on simulation on DG2 with dEQP-VK.pipeline.monolithic.extended_dynamic_state.mesh_shader.cmd_buffer_start.stencil_state_face_front_lt_inc_clamp_clear_254_ref_253_pass
With commit reverted :
Native code for unnamed mesh shader (null) (src_hash 0x48a71db2) (sha1 4e81188fb6d1bfb4e44efc46b8d6aac79a4f303e)
SIMD8 shader: 122 instructions. 0 loops. 1113 cycles. 0:0 spills:fills, 8 sends, scheduled with mode top-down. Promoted 0 constants. Compacted 1952 to 1776 bytes (9%)
START B0 (90 cycles)
send(1) g3UD g2UD nullUD 0x0228d580 0x00000000
ugm MsgDesc: ( load, a64, d32, V16, transpose, L1C_L3C dst_len = 2, src0_len = 1, src1_len = 0 flat ) base_offset 0 { align1 WE_all 1N $0 };
add(1) g76<2>UD g2<0,1,0>UD 0x00000080UD { align1 WE_all 1N $0.src };
and(8) g59<1>UD g0.6<0,1,0>UD 0x0000ffffUD { align1 1Q };
mov(8) g56<1>UD g0.13<0,1,0>UW { align1 1Q };
mov(8) g57<1>UD g0.8<0,1,0>UW { align1 1Q };
mov(8) g6<1>UD g0.9<0,1,0>UW { align1 1Q };
cmp.l.f0.0(1) null<1>UD g76<0,1,0>UD g2<0,1,0>UD { align1 WE_all 1N I@5 };
add(8) g7<1>D g57<1,1,0>D g6<1,1,0>D { align1 1Q I@2 compacted };
(+f0.0) add(1) g76.1<2>UD g2.1<0,1,0>UD 0x00000001UD { align1 WE_all 1N };
(-f0.0) mov(1) g76.1<2>UD g2.1<0,1,0>UD { align1 WE_all 1N I@1 };
mov(8) g2<1>UD g0.1<0,1,0>UD { align1 1Q };
sync nop(1) null<0,1,0>UB { align1 WE_all 1N I@2 };
send(1) g5UD g76UD nullUD 0x0218c580 0x00000000
ugm MsgDesc: ( load, a64, d32, V8, transpose, L1C_L3C dst_len = 1, src0_len = 1, src1_len = 0 flat ) base_offset 0 { align1 WE_all 1N $1 };
cmp.nz.f0.0(8) null<1>D g7<8,8,1>D 2D { align1 1Q I@4 };
(+f0.0) if(8) JIP: LABEL0 UIP: LABEL0 { align1 1Q };
END B0 ->B1 ->B2
START B1 <-B0 (448 cycles)
Without revert :
Native code for unnamed mesh shader (null) (src_hash 0x48a71db2) (sha1 c34d03a6e2a9ce0049b62b466e0d03768657fecd)
SIMD8 shader: 122 instructions. 0 loops. 1111 cycles. 0:0 spills:fills, 8 sends, scheduled with mode top-down. Promoted 0 constants. Compacted 1952 to 1776 bytes (9%)
START B0 (88 cycles)
send(1) g3UD g2UD nullUD 0x0228d580 0x00000000
ugm MsgDesc: ( load, a64, d32, V16, transpose, L1C_L3C dst_len = 2, src0_len = 1, src1_len = 0 flat ) base_offset 0 { align1 WE_all 1N $0 };
add(1) g76<2>UD g2<0,1,0>UD 0x00000080UD { align1 WE_all 1N $0.src };
and(8) g59<1>UD g0.6<0,1,0>UD 0x0000ffffUD { align1 1Q };
mov(8) g56<1>UD g0.13<0,1,0>UW { align1 1Q };
mov(8) g57<1>UD g0.8<0,1,0>UW { align1 1Q };
mov(8) g6<1>UD g0.9<0,1,0>UW { align1 1Q };
cmp.l.f0.0(1) null<1>UD g76<0,1,0>UD g2<0,1,0>UD { align1 WE_all 1N };
add(8) g7<1>D g57<1,1,0>D g6<1,1,0>D { align1 1Q I@2 compacted };
(+f0.0) add(1) g76.1<2>UD g2.1<0,1,0>UD 0x00000001UD { align1 WE_all 1N I@7 };
(-f0.0) mov(1) g76.1<2>UD g2.1<0,1,0>UD { align1 WE_all 1N I@1 };
mov(8) g2<1>UD g0.1<0,1,0>UD { align1 1Q };
sync nop(1) null<0,1,0>UB { align1 WE_all 1N I@2 };
send(1) g5UD g76UD nullUD 0x0218c580 0x00000000
ugm MsgDesc: ( load, a64, d32, V8, transpose, L1C_L3C dst_len = 1, src0_len = 1, src1_len = 0 flat ) base_offset 0 { align1 WE_all 1N $1 };
cmp.nz.f0.0(8) null<1>D g7<8,8,1>D 2D { align1 1Q I@4 };
(+f0.0) if(8) JIP: LABEL0 UIP: LABEL0 { align1 1Q };
END B0 ->B1 ->B2
START B1 <-B0 (448 cycles)