v3d: take into account prim_counts_offset
Specifically when reading the primitive counters.
This fixed ~700 CTS tests using this pattern: dEQP-GLES3.functional.transform_feedback.*
when run after tests like dEQP-GLES3.functional.prerequisite.read_pixels on the same caselist. When run individually those tests were passing because prim_counts_offset was zero.