intel/brw: Emit better code for read_invocation(x, constant) (!28097) · Merge requests · Mesa / mesa

Kenneth Graunke requested to merge kwg/mesa:brw-read-invoc into main Mar 11, 2024

intel/brw: Emit better code for read_invocation(x, constant)

For something as basic as read_invocation(x, 0), we were emitting:

   mov(8) vgrf67:D, 0d
   find_live_channel(8) vgrf236:UD, NoMask
   broadcast(8) vgrf237:D, vgrf67:D, vgrf236+0.0<0>:UD NoMask
   broadcast(8) vgrf235+0.0:W, vgrf197+0.0:W, vgrf237+0.0<0>:D NoMask
   mov(8) vgrf234+0.0:W, vgrf235+0.0<0>:W

This is way overcomplicated - if the invocation is a constant, we can simply emit a single MOV which reads the desired channel index. Not only that, but it's difficult to clean up:

If this expression appears multiple times, CSE will find all the redundant emit_uniformize(invocation) and get rid of the duplicate (find_live_channel+broadcast) on future instructions.
Copy propagation will put the 0d directly in the first broadcast.
Dead code elimination will get rid of the vgrf67 temp holding 0.
Algebraic will replace the first broadcast(x, 0) with a MOV.
Copy propagation will put the 0d directly in the second broadcast.
Dead code elimination will get rid of the vgrf237 temp.
Algebraic will replace the second broadcast(x, 0) with a MOV.
Copy propagation will finally combine the two MOVs

That's at least 7-8 optimization passes and several loops through the same passes just to clean up something we can do trivially.

Cuts 25% of the of the optimizer steps in pipeline 22200210259a2c9c of fossil-db/google-meet-clvk/BgBlur.1f58fdf742c27594.1 (31 to 23).

Shortens compilation time of the google-meet-clvk/Relight pipeline by -2.87717% +/- 0.509162% (n=150).

+@cmarcelo +@mattst88

Edited Mar 11, 2024 by Kenneth Graunke

Admin message

intel/brw: Emit better code for read_invocation(x, constant)

Merge request reports