Rusticl: unsigned int division results in wrong value
System information
System:
Host: arch Kernel: 6.10.6-arch1-1 arch: x86_64 bits: 64 compiler: gcc
v: 14.2.1
Distro: Arch Linux
CPU:
Info: quad core model: AMD Ryzen 3 2200G with Radeon Vega Graphics bits: 64
type: MCP arch: Zen rev: 0 cache: L1: 384 KiB L2: 2 MiB L3: 4 MiB
Speed (MHz): avg: 3607 high: 3850 min/max: 1600/3850 boost: disabled
cores: 1: 3377 2: 3367 3: 3834 4: 3850 bogomips: 30815
Flags: avx avx2 ht lm nx pae sse sse2 sse3 sse4_1 sse4_2 sse4a ssse3 svm
Graphics:
Device-1: AMD Ellesmere [Radeon RX 470/480/570/570X/580/580X/590]
vendor: Sapphire Nitro+ driver: amdgpu v: kernel arch: GCN-4 pcie:
speed: 8 GT/s lanes: 8 ports: active: HDMI-A-1,HDMI-A-2
empty: DP-1,DP-2,DVI-D-1 bus-ID: 01:00.0 chip-ID: 1002:67df temp: 46.0 C
Driver: gpu: amdgpu display-ID: 1
API: EGL v: 1.5 platforms: device: 0 drv: radeonsi device: 1 drv: swrast
gbm: drv: kms_swrast surfaceless: drv: radeonsi wayland: drv: radeonsi x11:
drv: radeonsi
API: OpenGL v: 4.6 compat-v: 4.5 vendor: amd mesa v: 24.1.6-arch1.1
glx-v: 1.4 direct-render: yes renderer: AMD Radeon RX 570 Series (radeonsi
polaris10 LLVM 18.1.8 DRM 3.57 6.10.6-arch1-1) device-ID: 1002:67df
Describe the issue
Running a simple openCL kernel with Rusticl with 2 nested unsigned int divisions results in wrong results, same issue does not happen when using (signed) int or Clover.
Seems like it's required for div1 to be 512 and div2 513.
Clover
Available platform: Clover
Available platform: rusticl
Using platform: Clover
Using device: AMD Radeon RX 570 Series (radeonsi, polaris10, LLVM 18.1.8, DRM 3.57, 6.10.6-arch1-1)
256 2606 2644 3024 3405 378 469 87 1569 1349
Rusticl
Available platform: rusticl
Available platform: Clover
Using platform: rusticl
Using device: AMD Radeon RX 570 Series (radeonsi, polaris10, LLVM 18.1.8, DRM 3.57, 6.10.6-arch1-1)
8388607 8388602 8388602 8388602 8388601 8388607 8388607 8388607 8388604 8388605
Using printf in Rusticl for additional information results in:
Available platform: rusticl
Available platform: Clover
Using platform: rusticl
Using device: AMD Radeon RX 570 Series (radeonsi, polaris10, LLVM 18.1.8, DRM 3.57, 6.10.6-arch1-1)
Id: 0 | ( 67451429 / 512 = 131741) | ( 131741 / 513 = 8388607)
Id: 1 | ( 684514641 / 512 = 1336942) | ( 1336942 / 513 = 8388602)
Id: 8 | ( 412341412 / 512 = 805354) | ( 805354 / 513 = 8388604)
Id: 9 | ( 354325253 / 512 = 692041) | ( 692041 / 513 = 8388605)
Id: 2 | ( 694514641 / 512 = 1356473) | ( 1356473 / 513 = 8388602)
Id: 3 | ( 794514641 / 512 = 1551786) | ( 1551786 / 513 = 8388602)
Id: 6 | ( 123343442 / 512 = 240905) | ( 240905 / 513 = 8388607)
Id: 7 | ( 23111252 / 512 = 45139) | ( 45139 / 513 = 8388607)
Id: 4 | ( 894514641 / 512 = 1747098) | ( 1747098 / 513 = 8388601)
Id: 5 | ( 99451464 / 512 = 194241) | ( 194241 / 513 = 8388607)
8388607 8388602 8388602 8388602 8388601 8388607 8388607 8388607 8388604 8388605
Example Code
#define CL_HPP_TARGET_OPENCL_VERSION 300
#include <iostream>
#include <CL/opencl.hpp>
std::string kernel_code = R"(
void kernel add(global const unsigned int* A, global unsigned int* B){
unsigned int tmp1 = A[get_global_id(0)];
unsigned int tmp2 = tmp1 / 512;
unsigned int tmp3 = tmp2 / 513;
// printf("Id: %2u | (%10u / %10u = %10u) | (%10u / %10u = %10u)\n", get_global_id(0), tmp1, 512, tmp2, tmp2, 513, tmp3);
B[get_global_id(0)] = tmp3;
}
)";
//
int main() {
std::vector<cl::Platform> all_platforms;
cl::Platform::get(&all_platforms);
for (auto& v : all_platforms) {
std::cout << "Available platform: " << v.getInfo<CL_PLATFORM_NAME>() << std::endl;
}
if (all_platforms.size() == 0) {
std::cout << "No platforms found. Check OpenCL installation!" << std::endl;
exit(1);
}
cl::Platform default_platform = all_platforms[0];
std::cout << "Using platform: " << default_platform.getInfo<CL_PLATFORM_NAME>() << std::endl;
std::vector<cl::Device> all_devices;
default_platform.getDevices(CL_DEVICE_TYPE_ALL, &all_devices);
if (all_devices.size() == 0) {
std::cout << "No devices found. Check OpenCL installation!" << std::endl;
exit(1);
}
cl::Device default_device = all_devices[0];
std::cout << "Using device: " << default_device.getInfo<CL_DEVICE_NAME>() << std::endl;
cl::Context context = cl::Context({default_device});
const unsigned int SIZE = 10;
std::vector<unsigned int> A_h = {67451429, 684514641, 694514641, 794514641, 894514641, 99451464, 123343442, 23111252, 412341412, 354325253};
std::vector<unsigned int> B_h = std::vector<unsigned int>(SIZE);
cl::Buffer A_d = cl::Buffer(context, A_h.begin(), A_h.end(), true);
cl::Buffer B_d = cl::Buffer(context, B_h.begin(), B_h.end(), false);
cl::CommandQueue queue = cl::CommandQueue(context, default_device);
cl::Program::Sources sources;
sources.emplace_back(kernel_code.c_str(), kernel_code.length());
cl::Program program = cl::Program(context, sources);
if (program.build({default_device}) != CL_SUCCESS) {
std::cout << "Error building: " << program.getBuildInfo<CL_PROGRAM_BUILD_LOG>(default_device) << std::endl;
exit(1);
}
cl::Kernel ko_add = cl::Kernel(program, "add");
cl::KernelFunctor<cl::Buffer, cl::Buffer>(ko_add)(cl::EnqueueArgs(queue, cl::NDRange(SIZE), cl::NDRange(2)), A_d, B_d).wait();
cl::copy(queue, B_d, B_h.begin(), B_h.end());
for (auto& v : B_h) {
std::cout << v << " ";
}
std::cout << std::endl;
return 0;
}
Edited by Winged Cutter