intel/fs: Add DP4A to get_lowered_simd_width
While working on cooperative matrix support, I noticed some invalid DP4A
instructions being generated.
dp4a(32) g33<1>UD g21<8,8,1>UD g1.0<0,1,0>UD g9<1,1,1>UD
This violates the constraint that the destination or a source can only access two consecutive GRFs.
I'm a little surprised that validation didn't catch this. Perhaps because it's a 3 source instruction? Either way, it seems like a bigger project to fix that.
Fixes: 0f809dbf