virgl3d is 1/6 as fast as host when running WebGL Aquarium
We know that virgl becomes slow with GPU intensive applications.
A common app that is used is WebGL Aquarium:
https://webglsamples.org/aquarium/aquarium.html
On the host, it can get 60 FPS with 10,000 fishes. On virgl, we can only get 6 FPS. We've found quadrupling VIRGL_MAX_CMDBUF_DWORDS in the guest can get this to 10 FPS, but that's still slow and further size changes have no effect.
We've found that the majority of the time is spent in the DRM_IOCTL_VIRTGPU_EXECBUFFER ioctl when running WebGL (60 --> 100 ms).
The following commit adds timing to vrend_decode_block to further examine the situation:
gurchetansingh/virglrenderer@34e0332b
[SET_VERTEX_BUFFERS: 8921 commands, 11 milliseconds], [DRAW_VBO: 8921 commands, 49 milliseconds], [SET_INDEX_BUFFER: 8921 commands, 10 milliseconds], [SET_CONSTANT_BUFFER: 8943 commands, 16 milliseconds], - TOTAL_time: 89 -
[vrend_decode.c: 1560] Took 133 milliseconds -- 0
[SET_VERTEX_BUFFERS: 2806 commands, 3 milliseconds], [DRAW_VBO: 2806 commands, 15 milliseconds], [SET_INDEX_BUFFER: 2804 commands, 3 milliseconds], [SET_CONSTANT_BUFFER: 2830 commands, 4 milliseconds], - TOTAL_time: 27 -
[vrend_decode.c: 1560] Took 42 milliseconds -- 0
[SET_VERTEX_BUFFERS: 4616 commands, 6 milliseconds], [DRAW_VBO: 4616 commands, 25 milliseconds], [SET_INDEX_BUFFER: 4616 commands, 5 milliseconds], [SET_CONSTANT_BUFFER: 4616 commands, 6 milliseconds], - TOTAL_time: 44 -
[vrend_decode.c: 1560] Took 67 milliseconds -- 0
[SET_VERTEX_BUFFERS: 2699 commands, 3 milliseconds], [DRAW_VBO: 2699 commands, 16 milliseconds], [SET_INDEX_BUFFER: 2699 commands, 3 milliseconds], [SET_CONSTANT_BUFFER: 2721 commands, 4 milliseconds], - TOTAL_time: 28 -
[vrend_decode.c: 1560] Took 43 milliseconds -- 0
[SET_VERTEX_BUFFERS: 604 commands, 1 milliseconds], [DRAW_VBO: 604 commands, 4 milliseconds], [SET_CONSTANT_BUFFER: 628 commands, 1 milliseconds], - TOTAL_time: 7 -
[vrend_decode.c: 1560] Took 11 milliseconds -- 0
[SET_VERTEX_BUFFERS: 9517 commands, 12 milliseconds], [DRAW_VBO: 9517 commands, 51 milliseconds], [SET_INDEX_BUFFER: 9517 commands, 11 milliseconds], [SET_CONSTANT_BUFFER: 9539 commands, 13 milliseconds], - TOTAL_time: 89 -
[vrend_decode.c: 1560] Took 135 milliseconds -- 0
There might be some timing overhead (there's 30 to 40 ms sometimes in vrend_decode I can't account for).
WebGL Aquarium sends uniforms + vertex data for every fish (~10,000), so the above makes sense. I suspect all draw intensive apps on Virgl will exhibit such behavior. Two theories:
- virgl command stream transformations (GL Commands --> Gallium Commands --> GL commands) leads to pipeline stalls
- decoding overhead.
I'm investigating, but I have no solutions. Just putting this out there in case somehow knows what's going on..