The GKE pool we're using is 1-3 32-core VMs, preemptible (to keep costs down), with 8 jobs concurrent per system. We have plenty of memory (4G/core), so we run make -j8 to try to keep the cores busy even when one job is in a single-threaded step (docker image download, git clone, artifacts processing, etc.) When all jobs are generating work for all the cores, they'll be scheduled fairly.
The nodes in the pool have 300GB boot disks (over-provisioned in space to provide enough iops and throughput) mounted to /ccache, and CACHE_DIR set pointing to them. This means that once a new autoscaled-up node has run some jobs, it should have a hot ccache from then on (instead of having to rely on the docker container cache having our ccache laying around and not getting wiped out by some other fd.o job). Local SSDs would provide higher performance, but unfortunately are not supported with the cluster autoscaler.
For now, the softpipe/llvmpipe test runs are still on the shared runners, until I can get them ported onto Bas's runner so they can be parallelized in a single job.