Draft: panfrost: Merge batches when generating mipmaps
The common code to generate mipmaps uses a large number of dependent batches.
Each level requires a separate batch, one or two jobs queued to the hardware,
and one or two ioctls:
[Tiling level 1]
[Fragment level 1]
[Tiling level 2]
[Fragment level 2]
...
[Tiling level n]
[Fragment level n]
Even generating complete mipmaps for small images can require a dozen or more
job chains, which is not performant.
While the fragment shading for each level depends on the previous level, the
tiling for each level is independent. So we can first run tiling for every
level, and then run fragment shading for every level, using only one or two job
chains total regardless of the number of levels.
[Tiling level 1] --> [Tiling level 2] --> ... --> [Tiling level n]
[Fragment level 1] --> [Fragment level 2] --> ... --> [Fragment level n]
The total number of /jobs/ is still linear to the number of levels, but the
number of /job chains/ and hence roundtirps to the kernel is now constant,
improving mipmap generation performance.
The actual implementation is tricky: each batch (resulting from each level
rendered to by the common mipmap generation code) is "merged" together at
submission time, and then all levels are submitted simultaneously in the
resulting "merged" superbatch. This requires some delicate accounting but seems
to work in practice.
Improves glmark2 -bterrain FPS by about 10% (roughly 48fps -> 52fps) on
Mali-G52.
Draft because I have more or less NAK'd this but want an MR created for easy referencing from a GitLab issue.