Draft: panfrost: Implement fast mipmapping in pan_blit
The common code to generate mipmaps uses a large number of dependent batches.
Each level requires a separate batch, one or two jobs queued to the hardware,
and one or two ioctls:
[Tiling level 1]
[Fragment level 1]
[Tiling level 2]
[Fragment level 2]
...
[Tiling level n]
[Fragment level n]
Even generating complete mipmaps for small images can require a dozen or more
job chains, which is not performant.
While the fragment shading for each level depends on the previous level, the
tiling for each level is independent. So we can first run tiling for every
level, and then run fragment shading for every level, using only one or two job
chains total regardless of the number of levels.
[Tiling level 1] --> [Tiling level 2] --> ... --> [Tiling level n]
[Fragment level 1] --> [Fragment level 2] --> ... --> [Fragment level n]
The total number of /jobs/ is still linear to the number of levels, but the
number of /job chains/ and hence roundtirps to the kernel is now constant,
improving mipmap generation performance.
Furthermore, on Bifrost we can take adventage of frame shaders to skip the
tiling altogether:
[Fragment level 1] --> [Fragment level 2] --> ... --> [Fragment level n]
The actual implementation is simple: Gallium's generate_mipmap() hook is
implemented manually, piggybacking off the existing pan_blit infrastructure to
generate the special "fragment job chains" required. While we could implement
the optimization generically in Gallium (with a u_blitter callback), it would
require invasive changes to the already tricky batch tracking logic. So we
instead handle the mipmap generation as an 'atomic' driver operation that
bypasses the main batch tracking logic in order to implement these special
optimizations.
Not yet tested/working on Midgard or Valhall.
Improves glmark2 -bterrain FPS by about 10% (roughly 48fps -> 52fps) on
Mali-G52.
Draft because this has portability issues, compare !17519 (closed)