nir: Optimize nir_foreach_function_with_impl_next a bit
There is 500001 function in shaders
bench code:
int64_t begin = os_time_get_nano();
int i = 0;
nir_foreach_function_impl(impl, dup) {
i += 1;
}
int64_t end = os_time_get_nano();
printf("elapsed time:%.3lfms i:%d\n", (end - begin) / 1000000.0, i);
old bench result:
elapsed time:9.943ms i:500001
elapsed time:10.146ms i:500001
elapsed time:10.032ms i:500001
elapsed time:9.931ms i:500001
elapsed time:9.983ms i:500001
elapsed time:9.966ms i:500001
elapsed time:9.855ms i:500001
elapsed time:9.958ms i:500001
elapsed time:9.867ms i:500001
elapsed time:9.845ms i:500001
new bench result:
elapsed time:8.415ms i:500001
elapsed time:8.333ms i:500001
elapsed time:8.324ms i:500001
elapsed time:8.232ms i:500001
elapsed time:8.928ms i:500001
elapsed time:8.320ms i:500001
elapsed time:8.404ms i:500001
elapsed time:8.560ms i:500001
elapsed time:8.433ms i:500001
elapsed time:8.383ms i:500001