OpenCL/Clover: AMD Turks: corrupt output buffer (depending on dimension order?)
Submitted by Dave Gilbert
Assigned to mes..@..op.org
Link to original bug (#103586)
Description
I've got a trivial kernel that draws a sphere in a voxel cube; each voxel should end up as 0 or 1; if I use global id 0 as z, 1 as y, 2 as x I get corruptions where some voxels have random junk in; if I reverse the order so that global id 0 is x, 1 is y and 2 is z then it's happy. (Confirmed the code is clean with oclgrind and happy on Intel.
Versions:
Number of devices 1 Device Name AMD TURKS (DRM 2.50.0 / 4.13.0-1-amd64, LLVM 5.0.0) Device Vendor AMD Device Vendor ID 0x1002 Device Version OpenCL 1.1 Mesa 17.2.4 Driver Version 17.2.4 Device OpenCL C Version OpenCL C 1.1
(on debian testing, was on stable, but same behaviour)
01:00.0 0300: 1002:6841
01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Thames [Radeon HD 7550M/7570M/7650M] (prog-if 00 [VGA controller])
Subsystem: Hewlett-Packard Company Thames [Radeon HD 7550M/7570M/7650M]
Flags: bus master, fast devsel, latency 0, IRQ 37
Memory at c0000000 (64-bit, prefetchable) [size=256M]
Memory at d4300000 (64-bit, non-prefetchable) [size=128K]
I/O ports at 4000 [size=256]
Expansion ROM at 000c0000 [disabled] [size=128K]
Capabilities: <access denied>
Kernel driver in use: radeon
Kernel modules: radeon
in an HP Elitebook laptop.
Code that triggers this: https://github.com/penguin42/opencl-play/commit/c98470685874769e4a59975791459180564b6f6e
build and run with: g++ -O2 ocl.cpp -lOpenCL && ./a.out 2> z then check output with: tr '01' ' ' <z|grep -v '^ *$'|egrep -v 'got_dev|^Z' which should be empty,
(In some builds I've found I've had to increase the SIZE constant to 256 to trigger it)
Then my commit e89fe62 fixes it with: diff --git a/sphere.ocl b/sphere.ocl index b4f23af..c89ecb9 100644 --- a/sphere.ocl +++ b/sphere.ocl @@ -1,10 +1,10 @@ __kernel void hello(__global uint* o) {
- int z = get_global_id(0);
- int z = get_global_id(2); int y = get_global_id(1);
- int x = get_global_id(2);
- int zr = get_global_size(0);
- int x = get_global_id(0);
- int zr = get_global_size(2); int yr = get_global_size(1);
- int xr = get_global_size(2);
- int xr = get_global_size(0); float zf = ((float)z - ((float)zr)/2) / (float)zr; float yf = ((float)y - ((float)yr)/2) / (float)yr; float xf = ((float)x - ((float)xr)/2) / (float)xr;
by just swapping z/x around - which should make no difference given it's a cube.
But....hmm, I've seen it fail in that direction now as well.
The corruptions all seem to be near the maximum x/y/z value - almost like one small chunk in the max corner.
Here's the kernel: __kernel void hello(__global uint* o) { int z = get_global_id(0); int y = get_global_id(1); int x = get_global_id(2); int zr = get_global_size(0); int yr = get_global_size(1); int xr = get_global_size(2); float zf = ((float)z - ((float)zr)/2) / (float)zr; float yf = ((float)y - ((float)yr)/2) / (float)yr; float xf = ((float)x - ((float)xr)/2) / (float)xr;
o[zyrxr + y*xr + x] = ((zf * zf) + (yf * yf) + (xf * xf)) < 0.25; }
Version: 17.2