tc: add an option for scaling the batch size
on some platforms, the overhead of triggering new thread jobs is so severe that it's more efficient to record an entire frame with tc and then process it in a single batch than it is to split the frame into multiple batches and have the driver thread incrementally process the workload
to alleviate this, allow drivers to enable scalable batch sizes. when enabled, tc batches will automatically scale to a larger size after they overflow, reducing the total number of batch-jobs that need to be run
improves perf on freedreno by ~5-10% in some cases and on zink+turnip by 40-50% in the same cases