lavapipe: hoo boy.
In the course of testing pipelines for !9547 (merged), I hit https://gitlab.freedesktop.org/mesa/mesa/-/pipelines/285955
On the surface, this looks like the usual sort of regression, but upon digging to a mariana trench-level depth of lavapipe/llvmpipe, I discovered something truly grimace-inducing: there's race conditions with lvp_FreeMemory
.
The repro case is simple: use zmike/zink-monotonic_queue
and run CTS with a caselist of
dEQP-GLES3.functional.transform_feedback.array_element.interleaved.points.highp_int
dEQP-GLES3.functional.transform_feedback.array_element.interleaved.points.mediump_float
Upon starting the second test, lvp crashes.
The reason for this is the following sequence of calls:
- set fragment shader UBO 0 (resource A)
- do a bunch of draws
- wait for draws to complete
- begin xfb query
- unset fragment shader UBO 0 (resource A)
- destroy resource A because it is no longer in use
- free backing mem of resource A
boom
boom
is a technical term that I'm using to describe reaching this approximate stack trace:
#0 0x00007ffff78954d7 in __memmove_avx_unaligned_erms () from /lib64/libc.so.6
#1 0x00007fffeed29204 in try_update_scene_state (setup=0x3f02650) at ../src/gallium/drivers/llvmpipe/lp_setup.c:1223
#2 0x00007fffeed26a37 in begin_binning (setup=0x3f02650) at ../src/gallium/drivers/llvmpipe/lp_setup.c:209
#3 0x00007fffeed26df9 in set_scene_state (setup=0x3f02650, new_state=SETUP_ACTIVE, reason=0x7fffeef3f8b0 "begin_query") at ../src/gallium/drivers/llvmpipe/lp_setup.c:323
#4 0x00007fffeed29dda in lp_setup_begin_query (setup=0x3f02650, pq=0x7fff78014cc0) at ../src/gallium/drivers/llvmpipe/lp_setup.c:1519
#5 0x00007fffeed0c8c9 in llvmpipe_begin_query (pipe=0x3ec5fc0, q=0x7fff78014cc0) at ../src/gallium/drivers/llvmpipe/lp_query.c:380
#6 0x00007fffeeaaf2f2 in handle_begin_query (cmd=0x4b36e10, state=0x7fff957f4790) at ../src/gallium/frontends/lavapipe/lvp_execute.c:2275
#7 0x00007fffeeab16ba in lvp_execute_cmd_buffer (cmd_buffer=0x3b1d7a0, state=0x7fff957f4790) at ../src/gallium/frontends/lavapipe/lvp_execute.c:2966
#8 0x00007fffeeab1a0f in lvp_execute_cmds (device=0x3ec5370, queue=0x3ec5eb0, fence=0x3b34a60, cmd_buffer=0x3b1d7a0) at ../src/gallium/frontends/lavapipe/lvp_execute.c:3070
#9 0x00007fffeea9c593 in queue_thread (data=0x3ec5eb0) at ../src/gallium/frontends/lavapipe/lvp_device.c:908
#10 0x00007fffeea9961f in impl_thrd_routine (p=0x3f165e0) at ../include/c11/threads_posix.h:87
#11 0x00007ffff7cae432 in start_thread () from /lib64/libpthread.so.0
#12 0x00007ffff7831913 in clone () from /lib64/libc.so.6
wherein llvmpipe is doing a scene update in order to start the query, still using resource A (which was destroyed) as UBO 0 (even though it will be unset before a draw occurs), and then it tries to read the backing memory from resource A, which was already freed in a non-asynchronous way that ignores the multi-tiered deferring that lavapipe and llvmpipe both do.
And when I found this out, I said to myself, "hoo boy, that's a bug and a half."