intel: intel_perf codegen bloat
Now that NIR's generated code bloat is knocked down, intel_perf is a good next target. I believe the bulk of its 1.6MB of generated code (so 3.2MB total in the shipped iris and anv drivers) is the per-gen counter initialization code. Right now we emit code initializing each counter's members in order. To reduce the size I think we need to:
- Move to static const structs for counter definitions and memcpy to query->counters[].
- Move the static const structs for a gen to an array, with things like the
perf->sys_vars.slice_mask & 0x01
checks expressed as data next to the struct definition, and then you init query->counters with pointers to it from that array. - Switch strings in the const structs to offsets within a deduped global string table to reduce relocations and the pointer sizes, replace the memcpy with a helper function to init a query->counters[] entry from the const struct with the offset string type instead.
- Maybe dedupe the static counter description structs across gens into a single table, and have the per-gen table be just the condition and offset to the counter struct.