The common code to generate mipmaps uses a large number of dependent batches. Each level requires a separate batch, one or two jobs queued to the hardware, and one or two ioctls: [Tiling level 1] [Fragment level 1] [Tiling level 2] [Fragment level 2] ... [Tiling level n] [Fragment level n] Even generating complete mipmaps for small images can require a dozen or more job chains, which is not performant. While the fragment shading for each level depends on the previous level, the tiling for each level is independent. So we can first run tiling for every level, and then run fragment shading for every level, using only one or two job chains total regardless of the number of levels. [Tiling level 1] --> [Tiling level 2] --> ... --> [Tiling level n] [Fragment level 1] --> [Fragment level 2] --> ... --> [Fragment level n] The total number of /jobs/ is still linear to the number of levels, but the number of /job chains/ and hence roundtirps to the kernel is now constant, improving mipmap generation performance. The actual implementation is tricky: each batch (resulting from each level rendered to by the common mipmap generation code) is "merged" together at submission time, and then all levels are submitted simultaneously in the resulting "merged" superbatch. This requires some delicate accounting but seems to work in practice. Improves glmark2 -bterrain FPS by about 10% (roughly 48fps -> 52fps) on Mali-G52.
Draft because I have more or less NAK'd this but want an MR created for easy referencing from a GitLab issue.