No double-rate fp16 flops with RADV
Hello RADV team
I have a tool that benchmark raw gpu FLOPS with vulkan api https://github.com/nihui/vkpeak
The shader code is very simple, that is repeated c = a * c + b
, for fp32/fp16/int16/... storage and arithmetic types
On intel and nvidia devices, we got nice double-rate fp16 flops
But I could not get double-rate fp16 flops on almost all amd devices
You may check them on openbenchmark https://openbenchmarking.org/test/pts/vkpeak&eval=cd773e177c8d50a2f663719b9a5291c30e70d1f4#metrics
- device: RX 5500XT
- os: fedora 34
- kernel: 5.12.11
- mesa: 21.1.3
[nihui@nihuini-LC2 build]$ ./vkpeak 0
device = AMD RADV NAVI14 (ACO)
fp32-scalar = 5323.67 GFLOPS
fp32-vec4 = 5290.48 GFLOPS
fp16-scalar = 5328.88 GFLOPS
fp16-vec4 = 5319.91 GFLOPS
fp64-scalar = 168.09 GFLOPS
fp64-vec4 = 168.06 GFLOPS
int32-scalar = 917.78 GIOPS
int32-vec4 = 1070.84 GIOPS
int16-scalar = 2667.90 GIOPS
int16-vec4 = 2670.36 GIOPS
[nihui@nihuini-LC2 build]$ RADV_DEBUG=llvm ./vkpeak 0
device = AMD RADV NAVI14 (LLVM 12.0.0)
fp32-scalar = 5321.55 GFLOPS
fp32-vec4 = 5319.99 GFLOPS
fp16-scalar = 2670.81 GFLOPS
fp16-vec4 = 2671.19 GFLOPS
fp64-scalar = 168.16 GFLOPS
fp64-vec4 = 168.19 GFLOPS
int32-scalar = 928.06 GIOPS
int32-vec4 = 1070.69 GIOPS
int16-scalar = 5333.64 GIOPS
int16-vec4 = 5316.92 GIOPS
Edited by nihui