PutImage for XYPixmap is pathological
I should emphasize, right up front, that nobody should use XYPixmap for anything at all ever. Nonetheless it's a mandatory part of the protocol, and it's almost certainly not accelerated, which means a naughty client can easily abuse it to disrupt interactivity. x11perf -shmputxy500
against Xvfb scores around 100 on a 2.2GHz Skylake, and you'd really prefer that no single operation could take 10ms like that.
Mostly the slowdown is because fbPutXYImage
works a bitplane at a time, so at depth 24 you're doing a read/modify/write cycle for the entire destination 24 or 32 times, and that's going to blow away your data cache quite effectively. A cheap fix might be to walk the image first by span then by bitplane, which should keep more of the working set in dcache. A complicated fix would perform a gather read from each plane and build the final pixels whole before combining them with the destination.