Skip to content

radv: Clean up radv_emit_ngg_culling_state to reduce its CPU overhead

Timur Kristóf requested to merge Venemo/mesa:radv_cpu_overhead_ngg_culling into main

This MR now passes the Valve CI.

The radv_emit_ngg_culling_state function has significant CPU overhead, sometimes even when NGG culling is not used. This MR aims to fix that by changing the code so it is entirely based on dirty flags and without the hacks that it previously had:

  • Remove the code that set LDS_SIZE dynamically. This never resulted in a measurable perf benefit that I'm aware of, but it added a lot of spaghetti to the code.
  • Remove the small draw optimization from the command buffer and implement it in shader code (to skips NGG culling for small draws). This only adds a few extra SALU instructions, but allows to remove additional spaghetti.
  • Finally, call radv_emit_ngg_culling_state based on dirty flags, which makes the code much cleaner, and also reduces its CPU overhead.

Testing

CPU settings used for testing:

# Set all cores to use performance governor
for i in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor; do echo performance > $i; done
# Disable CPU boost
echo 0 > /sys/devices/system/cpu/cpufreq/boost
# Disable address space randomization
echo 0 > /proc/sys/kernel/randomize_va_space

Test score is the average of 3 runs:

for i in {1..3}; do echo "--- run $i"; ./vkoverhead -test 0 -duration 10; done
commit 3900X + 7900XTX score 6850U score
main 4a675f93 26588 26397
This MR 28992 28631
Total improvement compared to main +9% +8.4%
Edited by Timur Kristóf

Merge request reports