Skip to content

freedreno/rddecompiler: Improve `generate-rd.cc` compilation time by splitting it into multiple translation units

For rd traces collected with more demanding presets submits decompiled with rddecompiler can end up producing a fairly big generate-rd.cc (in my case 63M) which takes quite a long time to compile as it's just a single translation unit and the compiler ends up using just one thread (looking at htop).

This MR adds a -m options which splits up IBs larger than 512 in size, and shaders into their own file so the compiler can process them in parallel and we can then link everything together. This improves compile times on multi-core systems. An additional benefit is for debugging issues, as modifications to the generated code don't require the whole of the command stream (in generate-rd.cc) to be recompiled so recompilations are faster.

Debug build compilation times on my system (16 cores, 32 threads) after having build the empty generate-rd.cc: Original (63M generate-rd.cc)

real    10m10.330s
user    10m1.404s
sys     0m8.101s

Using -m

real    0m17.303s
user    3m35.548s
sys     0m25.330s

How to use:

  • rddecompiler -m <some_folder> -s <submit_n> <file>.
  • mv <some_folder>/* .../<mesa_src>/src/freedreno/decode/generate-rd/.
  • Add (void)0; somewhere in the generate-rd.cc file. It seems like meson doesn't always detect the change in generate-rd.cc properly. This was also happening previous to these changes for me. Making a modification to the file fixes things.
  • Copy the array from input_resources.txt to generate-rd/meson.build to specify the source files for the static library.
  • Build mesa.

Merge request reports

Loading