radeonsi: bad performance on PBO packs
Moving texture contents to the CPU calling glGetTextureSubImage
+ glFinish
on a persistently mapped buffer is slower than calling glGetTextureSubImage
to a CPU pointer with GL_PIXEL_PACK_BUFFER
being zero.
Here is a simple test application that tries to reproduce the issue. There's a #define USE_PBO
preprocessor definition on top of the file. Setting it to 0
will use immediate uploads, setting it to 1
will use a persistently mapped buffer.
I haven't profiled radeonsi to see where most of the time is going on each path.
This happens on yuzu (Nintendo Switch emulator) on the recently released Skyward Sword HD, resulting in ~8 FPS on average compared to 60 on other drivers (e.g. iris and AMD's proprietary blob). The measured performance difference for radeonsi on the test application is around 33%.