aco: optimize dd[xy]_fine if it's only used by abs
If we can ignore the sign of the derivative, we can swap the lanes instead of broadcasting per direction. abs(a - b) = abs(b - a).
Shamelessly copied from bifrost.
Foz-DB Navi31:
Totals from 5 (0.01% of 79206) affected shaders:
Instrs: 6191 -> 6184 (-0.11%)
CodeSize: 31960 -> 31920 (-0.13%)
Latency: 111961 -> 111926 (-0.03%)
InvThroughput: 10390 -> 10372 (-0.17%)
VALU: 3286 -> 3279 (-0.21%)