generator: Run all gir processes in parallel
Since the addition of doc regeneration - which also spawns a gir process for every non-sys crate - the process is now incredibly slow and not well suited for iterative development:
./generator.py --no-fmt 26.25s user 0.79s system 99% cpu 27.044 total
All gir processes are currently ran in serial (the generator waits for one to complete before spawning the next process) even though there are no inter-dependencies. Simply spawning all processes at once and collecting their results + printing them in order after everything has been spawned yields a significant speedup:
./generator.py --no-fmt 37.99s user 0.88s system 3285% cpu 1.183 total
Note: this is on a 32-core ThreadRipper. The improvement is more modest on machines with less cores, and also depends on IO speed. A 4-core i5, before and after:
./generator.py --no-fmt 30.24s user 0.76s system 99% cpu 31.055 total
./generator.py --no-fmt 57.78s user 0.88s system 763% cpu 7.685 total
That's still a sizable gain for simply not blocking on other tasks anymore.