freedreno/replay: KGSL support, a7xx support, shader and cp printing

What does this MR do and why?

Changes extracted from !23217 (merged) and could be reviewed independently.

These changes were useful for a7xx bring up.

The "print" instructions for ir3 assembly and commandstream in replay are as minimalistic as possible but still very useful.

