radv: Clean up radv_emit_ngg_culling_state to reduce its CPU overhead (!20980) · Merge requests · Mesa / mesa

Timur Kristóf requested to merge Venemo/mesa:radv_cpu_overhead_ngg_culling into main Jan 30, 2023

The radv_emit_ngg_culling_state function has significant CPU overhead, sometimes even when NGG culling is not used. This MR aims to fix that by changing the code so it is entirely based on dirty flags and without the hacks that it previously had:

Remove the code that set LDS_SIZE dynamically. This never resulted in a measurable perf benefit that I'm aware of, but it added a lot of spaghetti to the code.
Remove the small draw optimization from the command buffer and implement it in shader code (to skips NGG culling for small draws). This only adds a few extra SALU instructions, but allows to remove additional spaghetti.
Finally, call radv_emit_ngg_culling_state based on dirty flags, which makes the code much cleaner, and also reduces its CPU overhead.

Testing

CPU settings used for testing:

# Set all cores to use performance governor
for i in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor; do echo performance > $i; done
# Disable CPU boost
echo 0 > /sys/devices/system/cpu/cpufreq/boost
# Disable address space randomization
echo 0 > /proc/sys/kernel/randomize_va_space

Test score is the average of 3 runs:

for i in {1..3}; do echo "--- run $i"; ./vkoverhead -test 0 -duration 10; done

commit	3900X + 7900XTX score	6850U score
main 4a675f93	26588	26397
This MR	28992	28631
Total improvement compared to main	+9%	+8.4%

Edited Feb 04, 2023 by Timur Kristóf

radv: Clean up radv_emit_ngg_culling_state to reduce its CPU overhead

Testing

Merge request reports