Skip to content

intel/dev: Enable VK_KHR_cooperative_matrix on all Gfx9+ GPUs

Ian Romanick requested to merge idr/mesa:review/cmat2 into main

Gfx12.5 (DG2) will use DPAS instructions to accelerate the implementation. Earlier platforms will use equivalent discrete instructions (basically subgroup operations). Gfx12 (Tigerlake) will use DP4A for 8-bit integer matrix multiplication. Older platforms, which lack DP4A, will use a suboptimal instruction sequence. There is plenty of room for improvement here.

This is currently blocked behind the ANV_COOPERATIVE_MATRIX environment variable. The main reason for this is "WIP: anv: Set PIPELINE_SELECT systolic mode enable flag". This bit must be set if I compute shader will used DPAS on DG2. This modifies GPU clocks in ways that may hurt performance on other workloads, so we only want the bit set when necessary. We also don't want to emit more PIPELINE_SELECTs than necessary. I am looking for suggestions here.

In addition, there is a lot of performance work to be done. The path for loading or storing row-major "B" matrices and column-major other matrices is really bad. I have some ideas to massively reduce the number of load at the cost of a couple added moves.

On DG2 (Gfx12.5) gets the following results from the CTS:

    Test run totals:
      Passed:        1642/13982 (11.7%)
      Failed:        0/13982 (0.0%)
      Not supported: 12340/13982 (88.3%)
      Warnings:      0/13982 (0.0%)
      Waived:        0/13982 (0.0%)

On DG2 (Gfx12.5) with forced lowering, Raptor Lake (Gfx12) and Ice Lake (Gfx11):

    Test run totals:
      Passed:        1662/13982 (11.9%)
      Failed:        0/13982 (0.0%)
      Not supported: 12320/13982 (88.1%)
      Warnings:      0/13982 (0.0%)
      Waived:        0/13982 (0.0%)

The difference in the number of tests run is due to saturatingAccumulation not being set on DG2 when DPAS is used. There is a comment in "intel/dev: Advertise integer configs with saturatingAccumulation too" that explains how this could be added should the need arise.

Edited by Ian Romanick

Merge request reports