The common code to generate mipmaps uses a large number of dependent batches. Each level requires a separate batch, one or two jobs queued to the hardware, and one or two ioctls: [Tiling level 1] [Fragment level 1] [Tiling level 2] [Fragment level 2] ... [Tiling level n] [Fragment level n] Even generating complete mipmaps for small images can require a dozen or more job chains, which is not performant. While the fragment shading for each level depends on the previous level, the tiling for each level is independent. So we can first run tiling for every level, and then run fragment shading for every level, using only one or two job chains total regardless of the number of levels. [Tiling level 1] --> [Tiling level 2] --> ... --> [Tiling level n] [Fragment level 1] --> [Fragment level 2] --> ... --> [Fragment level n] The total number of /jobs/ is still linear to the number of levels, but the number of /job chains/ and hence roundtirps to the kernel is now constant, improving mipmap generation performance. Furthermore, on Bifrost we can take adventage of frame shaders to skip the tiling altogether: [Fragment level 1] --> [Fragment level 2] --> ... --> [Fragment level n] The actual implementation is simple: Gallium's generate_mipmap() hook is implemented manually, piggybacking off the existing pan_blit infrastructure to generate the special "fragment job chains" required. While we could implement the optimization generically in Gallium (with a u_blitter callback), it would require invasive changes to the already tricky batch tracking logic. So we instead handle the mipmap generation as an 'atomic' driver operation that bypasses the main batch tracking logic in order to implement these special optimizations. Not yet tested/working on Midgard or Valhall. Improves glmark2 -bterrain FPS by about 10% (roughly 48fps -> 52fps) on Mali-G52.
Draft because this has portability issues, compare !17519 (closed)