cooperative matrix store difference on nvidia vs radv/anv
I've done most of the debugging off this on radv, but I think anv also suffers from the same or very similiar problem, which might suggest either a bug in the common code or the test.
The test is idr's port of the nvidia perf test code https://github.com/ianromanick/vk_cooperative_matrix_perf
When run against the nvidia driver with --correctness it all passes.
When I run it against the latest radv (and anv) I see corruption, now I fixed the test to use epsilon deltas, but I still see missing rows of data. It appears the stores aren't happening to the correct places or enough of them.
It seems possibly related to the local_size_x in the shader, since tweaking that from 32->64 changes the pattern of the missing data.
I'll keep digging, but my brain is having trouble figuring out what might wrong, whereas someone more familiar with the extension might have a better clue.
I've also hacked the test locally to just to coop mat stores with no loads/calcs, and I see the same missing blocks. (the test might be making a some nvidia assumption I'm not spotting).