How best to pass the same clmem into a function twice, on beignet
Submitted by Hugh Perkins
Assigned to Zhigang Gong @gongzg
Description
Tensorflow allocates one massive block of memory, then carves individual tensors out of this. unary/binary eigen kernels are then passed these tensors, as arguments.
In my opencl implementation for tensorflow, the huge block of memory is then one clmem object. I pass this clmem object into the kernel multiple times, once for each tensor, along with the appropriate offset.
A unit test for this approach fails on beignet. https://github.com/hughperkins/cuda-on-cl/blob/f240ad6c7d339f3244d8ce6acc4253f7c6a515ad/test/test_singlebuffer.py#L74-L85 The results tensor comes back all zeros.
I tried working aroudn the problem by passing in each clmem just once, and then connecting it to the appropriate tensors, but this crashes the beignet opencl compiler, at runtime, with an llvm error inside gbe.
I can think of various ways to workaround the issue, but I'm wondering what your thoughts are on workable approaches to workaround the issue reliably?