Performance compare to Intel® proprietary realization
Submitted by ilia
Assigned to Rong Yang @rongyang
Description
Created attachment 118911 clinfo
Beignet GPU vs Intel® OpenCL CPU vs POCL, secs: 1.351889 vs 1.073667 vs 7.501667
Intel® software is opencl-1.2-5.0.0.57 (for CPU only) Beignet master at 00e207e2 with llvm-3.7
My task is naive unoptimized dct on bunch of float arrays with size 128. Intel® variant works faster even on 65536 global size - seems like it does some obvious optimizations on dct algorithm, which is very optimizable, as we know. Is it possible to pass some optimization parameters to llvm to get more faster code? Or Intel®'s black magic is impossible to be repeated?
Attachment 118911, "clinfo":
clinfo.txt