radeonsi: persistent, read-only buffer maps are slow to read
Here is a test application that allocates a buffer and maps it persistently:
glGenBuffers(1, &download_pbo);
glBindBuffer(GL_PIXEL_PACK_BUFFER, download_pbo);
glBufferStorage(GL_PIXEL_PACK_BUFFER, buffer_size, nullptr,
GL_MAP_READ_BIT | GL_MAP_PERSISTENT_BIT);
mapped_buffer =
(char *)glMapBufferRange(GL_PIXEL_PACK_BUFFER, 0, buffer_size,
GL_MAP_READ_BIT | GL_MAP_PERSISTENT_BIT);
Then proceeds to read its contents each frame:
memcpy(copy.data(), mapped_buffer, copy.size());
Reading this mapped buffer results in ~2.5 FPS (~438ms) on my machine.
In contrast, using glBufferData
+ glMapBuffer
/glUnmapBuffer
on each frame results in 580-940 FPS (~1.1ms).
Both paths can be tested changing the preprocessor definition at the top of the file.
Removing GL_MAP_PERSISTENT_BIT
from glMapBufferRange
also makes it run fast, but I don't think that's a valid API usage, given we are calling glCallList
while a buffer is mapped.
Using a streaming based memcpy is twice as fast, but still slow.
Derived from #5084 (closed).
I'm not familiar with the memory types used in mesa, or how other drivers manage to make reads to this buffer fast. Would it be possible to make reading from MAP_READ_BIT | MAP_PERSISTENT_BIT
faster?