Currently, we implement transform feedback by dispatching specialized "transform feedback" programs at draw time. This decouples transform feedback from regular vertex shaders, allowing our usual IDVS tricks to work ok there.
That works ok and is not changed here. The problem with the previous implementation is the mechanism of dispatch. In the old implementation, we had a special launch path that is compiled per-gen and does architecture-specific dispatch of a compute/vertex shader. The code-complexity is O(N) to the number of architectures we support. Every time Mali dispatch changes, that needs a new code path duplicating the new code path added to pipe->draw_vbo and pipe->launch_grid. With the incoming CSF support, we will have another (dramatically different) dispatch mechanism.
What we really want is a generic dispatch that reuses the pipe->draw_vbo or pipe->launch_grid machinery, to avoid open-coding the hardware structures in another place. That generic routine would not need specialization based on architecture, which is cleaner.
This is an application of a general best practice: don't open-code your dispatch in multiple places, use meta operations instead. If panvk1 followed that, maybe we could've supported v9 in a sane way...