radeonsi: persistent, read-only buffer maps are slow to read

Here is a test application that allocates a buffer and maps it persistently:

  glGenBuffers(1, &download_pbo);
  glBindBuffer(GL_PIXEL_PACK_BUFFER, download_pbo);
  glBufferStorage(GL_PIXEL_PACK_BUFFER, buffer_size, nullptr,
                  GL_MAP_READ_BIT | GL_MAP_PERSISTENT_BIT);
  mapped_buffer =
      (char *)glMapBufferRange(GL_PIXEL_PACK_BUFFER, 0, buffer_size,
                               GL_MAP_READ_BIT | GL_MAP_PERSISTENT_BIT);

Then proceeds to read its contents each frame:

memcpy(copy.data(), mapped_buffer, copy.size());

Reading this mapped buffer results in ~2.5 FPS (~438ms) on my machine. In contrast, using glBufferData + glMapBuffer/glUnmapBuffer on each frame results in 580-940 FPS (~1.1ms).

Both paths can be tested changing the preprocessor definition at the top of the file.

Removing GL_MAP_PERSISTENT_BIT from glMapBufferRange also makes it run fast, but I don't think that's a valid API usage, given we are calling glCallList while a buffer is mapped.

Using a streaming based memcpy is twice as fast, but still slow.

Derived from #5084 (closed).

I'm not familiar with the memory types used in mesa, or how other drivers manage to make reads to this buffer fast. Would it be possible to make reading from MAP_READ_BIT | MAP_PERSISTENT_BIT faster?

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information

Admin message

radeonsi: persistent, read-only buffer maps are slow to read