intel/brw: Emit better code for read_invocation(x, constant)
intel/brw: Emit better code for read_invocation(x, constant)
For something as basic as read_invocation(x, 0), we were emitting:
mov(8) vgrf67:D, 0d
find_live_channel(8) vgrf236:UD, NoMask
broadcast(8) vgrf237:D, vgrf67:D, vgrf236+0.0<0>:UD NoMask
broadcast(8) vgrf235+0.0:W, vgrf197+0.0:W, vgrf237+0.0<0>:D NoMask
mov(8) vgrf234+0.0:W, vgrf235+0.0<0>:W
This is way overcomplicated - if the invocation is a constant, we can simply emit a single MOV which reads the desired channel index. Not only that, but it's difficult to clean up:
- If this expression appears multiple times, CSE will find all the redundant emit_uniformize(invocation) and get rid of the duplicate (find_live_channel+broadcast) on future instructions.
- Copy propagation will put the 0d directly in the first broadcast.
- Dead code elimination will get rid of the vgrf67 temp holding 0.
- Algebraic will replace the first broadcast(x, 0) with a MOV.
- Copy propagation will put the 0d directly in the second broadcast.
- Dead code elimination will get rid of the vgrf237 temp.
- Algebraic will replace the second broadcast(x, 0) with a MOV.
- Copy propagation will finally combine the two MOVs
That's at least 7-8 optimization passes and several loops through the same passes just to clean up something we can do trivially.
Cuts 25% of the of the optimizer steps in pipeline 22200210259a2c9c of fossil-db/google-meet-clvk/BgBlur.1f58fdf742c27594.1 (31 to 23).
Shortens compilation time of the google-meet-clvk/Relight pipeline by -2.87717% +/- 0.509162% (n=150).