clover: OpenCL regression with LuxMark 3.1 and -cl-fast-relaxed-math default option: it now renders garbage but it worked before
Hi, LuxMark 3.1 now produces garbage when using Mesa Clover and AMD GCN 2, it worked before (proof).
While the issue title is similar, this issue is not a duplicate of #3584 issue.
- Software versions that worked on 2020-06-22: Mesa 20.0.8, Ubuntu 20.04, Linux 5.4.0
- Software versions that do not work on 2021-11-23: Mesa 21.2.2, Ubuntu 21.10, Linux 5.13.0
The same GCN2 R9 390X OC Edition (Hawaii/Grenada) GPU was used in both case, but it was moved from an Asus Sabertooth 990FX R2.0 motherboard with AMD FX-9590 CPU with DDR3 and PCIe 2.0 to a Gigabyte WRX80-SU8-IPMI rev. 1.0 with AMD Ryzen Threadripper PRO 3955WX CPU with DDR4 ECC and PCIe 4.0, I doubt the host changed something.
Here is what produces the same software on same hardware with an older AMDGPU-Pro Orca, previously Clover produced the same:
You'll notice the host also features a GCN1 R7 240-2GD5-L (Oland), similar render garbage is obtained with Clover, but on the other hand AMD Orca makes LuxMark crash since amdgpu-pro 18.30, and 18.20 reports “OpenCL ERROR: clCreateCommandQueue(-6)” so I don't know if it's significant.
I had not tested the GCN1 R7 Oland GPU with older Clover, but it was probably working, pre-GCN also worked like TeraScale 3 (proof) and TeraScale 2 (proof), I have not yet tested TeraScale GPUs with new Clover.
The GCN2 Hawaii/Grenada R9 390X not working anymore with Clover is problematic since ROCm marked this GPU as unsupported after years of failure (it only briefly worked in 2018) and since AMDGPU-Pro 21.30 the Orca driver does not support this GPU anymore (I don't know if that's a mistake or not, see drm/amd#1806 (closed) issue).
The GCN2 Hawaii worked for years with Clover, while being almost twice faster than AMDGPU-Pro Orca or ROCm when it worked (see my #3584 (comment 762322) comment).
The current OpenCL status for GCN2 Hawaii gfx7 on Linux is:
- AMD/ROCm: broken since years (last seen working in 2018), now considered unsupported by ROCm and if someone attempts to run it with such GPU plugged in, it would wreck the kernel and the user will be asked to reboot (see ROCm#1624 issue).
- AMD/Orca: unsupported by AMDGPU-Pro since 21.30 (2021-08-10)
- Mesa/Clover: does not work with non-image workflows like LuxMark since very recently (seen working in 2020-06-22, seen broken on 2021-11-23), and does not work with image workflows like Darktable (image support still missing).
The LuxMark 3.1 build can be found there: http://wiki.luxcorerender.org/LuxMark_v3