radv: Clean up radv_emit_ngg_culling_state to reduce its CPU overhead
This MR now passes the Valve CI.
The radv_emit_ngg_culling_state
function has significant CPU overhead, sometimes even when NGG culling is not used. This MR aims to fix that by changing the code so it is entirely based on dirty flags and without the hacks that it previously had:
- Remove the code that set
LDS_SIZE
dynamically. This never resulted in a measurable perf benefit that I'm aware of, but it added a lot of spaghetti to the code. - Remove the small draw optimization from the command buffer and implement it in shader code (to skips NGG culling for small draws). This only adds a few extra SALU instructions, but allows to remove additional spaghetti.
- Finally, call
radv_emit_ngg_culling_state
based on dirty flags, which makes the code much cleaner, and also reduces its CPU overhead.
Testing
CPU settings used for testing:
# Set all cores to use performance governor
for i in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor; do echo performance > $i; done
# Disable CPU boost
echo 0 > /sys/devices/system/cpu/cpufreq/boost
# Disable address space randomization
echo 0 > /proc/sys/kernel/randomize_va_space
Test score is the average of 3 runs:
for i in {1..3}; do echo "--- run $i"; ./vkoverhead -test 0 -duration 10; done
commit | 3900X + 7900XTX score | 6850U score |
---|---|---|
main 4a675f93 | 26588 | 26397 |
This MR | 28992 | 28631 |
Total improvement compared to main | +9% | +8.4% |