freedreno/rddecompiler: Improve `generate-rd.cc` compilation time by splitting it into multiple translation units
For rd traces collected with more demanding presets submits decompiled with rddecompiler
can end up producing a fairly big generate-rd.cc
(in my case 63M) which takes quite a long time to compile as it's just a single translation unit and the compiler ends up using just one thread (looking at htop).
This MR adds a -m
options which splits up IBs larger than 512 in size, and shaders into their own file so the compiler can process them in parallel and we can then link everything together. This improves compile times on multi-core systems.
An additional benefit is for debugging issues, as modifications to the generated code don't require the whole of the command stream (in generate-rd.cc
) to be recompiled so recompilations are faster.
Debug build compilation times on my system (16 cores, 32 threads) after having build the empty generate-rd.cc
:
Original (63M generate-rd.cc
)
real 10m10.330s
user 10m1.404s
sys 0m8.101s
Using -m
real 0m17.303s
user 3m35.548s
sys 0m25.330s
How to use:
-
rddecompiler -m <some_folder> -s <submit_n> <file>
. -
mv <some_folder>/* .../<mesa_src>/src/freedreno/decode/generate-rd/
. - Add
(void)0;
somewhere in thegenerate-rd.cc
file. It seems like meson doesn't always detect the change in generate-rd.cc properly. This was also happening previous to these changes for me. Making a modification to the file fixes things. - Copy the array from
input_resources.txt
togenerate-rd/meson.build
to specify the source files for the static library. - Build mesa.