Alien Isolation segfaulting in radeonsi_dri.so starting with Mesa 22.1.0
The Linux version of the "Alien Isolation" game (Steam release) will crash after a couple of minutes of gameplay starting with Mesa version 22.1.0. My current distro is Artix Linux and I was also running Mesa 22.1.3. My GPU is an AMD Radeon 5700.
Possibly related:
I've already analysed the crash a bit in a different issue of a different repo, so I've copied the text off from my comment from here: https://github.com/ValveSoftware/steam-for-linux/issues/6800#issuecomment-1179616726
After digging around the issue a bit it appears that the crash happens in /usr/lib/dri/radeonsi_dri.so
of the mesa package, in source file mesa-22.1.3/src/gallium/drivers/radeonsi/si_buffer.c
, function si_buffer_do_flush_region()
when it tries to dereference buf
in that function, but buf
is zero.
To debug this I've added gdb
in the ~/.local/share/Steam/steamapps/common/Alien Isolation/AlienIsolation.sh
script, in front of the line where Alien Isolation gets executed. When Steam is running in a terminal, I can start Alien Isolation from my library and then interact with GDB in the terminal to get a backtrace of the crash. I've used https://wiki.archlinux.org/title/Debugging/Getting_traces as a guideline.
GDB backtrace:
Thread 42 "AlienIso:gdrv0" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffef5f7a640 (LWP 14232)]
si_buffer_do_flush_region () at ../mesa-22.1.3/src/gallium/drivers/radeonsi/si_buffer.c:495
Downloading -0.00 MB source file /usr/src/debug/build/../mesa-22.1.3/src/gallium/drivers/radeonsi/si_buffer.c
495 util_range_add(&buf->b.b, &buf->valid_buffer_range, box->x, box->x + box->width);
(gdb) bt full
#0 si_buffer_do_flush_region () at ../mesa-22.1.3/src/gallium/drivers/radeonsi/si_buffer.c:495
No locals.
#1 0x00007fffd6e4aea9 in si_buffer_flush_region () at ../mesa-22.1.3/src/gallium/drivers/radeonsi/si_buffer.c:507
No locals.
#2 si_buffer_flush_region () at ../mesa-22.1.3/src/gallium/drivers/radeonsi/si_buffer.c:498
No locals.
#3 0x00007fffd6bc04d6 in tc_call_transfer_flush_region ()
at ../mesa-22.1.3/src/gallium/auxiliary/util/u_threaded_context.c:2300
No locals.
#4 0x00007fffd6ba3c29 in tc_batch_execute () at ../mesa-22.1.3/src/gallium/auxiliary/util/u_threaded_context.c:211
No locals.
#5 0x00007fffd66bc897 in util_queue_thread_func () at ../mesa-22.1.3/src/util/u_queue.c:313
No locals.
#6 0x00007fffd66b5ecc in impl_thrd_routine () at ../mesa-22.1.3/include/c11/threads_posix.h:87
No locals.
#7 0x00007ffff6db554d in start_thread (arg=<optimized out>) at pthread_create.c:442
ret = <optimized out>
pd = <optimized out>
out = <optimized out>
unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140733025068608, 329563977292428580, 140737488332750, 0,
140737488332751, 140733016678400, -328987761102572252, -329548896436112092}, mask_was_saved = 0}},
priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
not_first_call = <optimized out>
#8 0x00007ffff6e3a874 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:100
No locals.
(gdb) quit
A debugging session is active.
Inferior 1 [process 14009] will be killed.
Quit anyway? (y or n) y
I've also tracked down at which version it would stop working.
The last version provided by arch that does not crash Alien Isolation:
mesa 22.0.4-1
The starting version provided by arch where it will crash Alien Isolation:
mesa 22.1.0-1
When downgrading to these mesa versions one would need to downgrade to an older llvm
version, llvm 13.0.1-1-x86_64
worked for me. Other mesa related packages with the same mesa version number needed downgrades too.
So as a quick and dirty solution in the mesa source code tree I've added a check against buf
so that the function returns before it gets dereferenced:
// Edited mesa-22.1.0/src/gallium/drivers/radeonsi/si_buffer.c
// Can be applied to mesa-22.1.3 too
static void si_buffer_do_flush_region(struct pipe_context *ctx, struct pipe_transfer *transfer,
const struct pipe_box *box)
{
struct si_context *sctx = (struct si_context *)ctx;
struct si_transfer *stransfer = (struct si_transfer *)transfer;
// si_resource() basically casts transfer->resource to a struct si_resource * and returns it unchanged.
// si_resource() is defined si_pipe.h in the same directory as si_buffer.c.
struct si_resource *buf = si_resource(transfer->resource);
if (!buf) {
// The following fputs() line is optional, so it's just there for debugging purposes.
fputs("si_buffer_do_flush_region: transfer->resource is null!\n", stderr);
return;
}
if (stransfer->staging) {
unsigned src_offset =
stransfer->b.b.offset + transfer->box.x % SI_MAP_BUFFER_ALIGNMENT + (box->x - transfer->box.x);
/* Copy the staging buffer into the original one. */
// transfer->resource might be also in danger here
si_copy_buffer(sctx, transfer->resource, &stransfer->staging->b.b, box->x, src_offset,
box->width, SI_OP_SYNC_BEFORE_AFTER);
}
// Would crash here when buf is zero
util_range_add(&buf->b.b, &buf->valid_buffer_range, box->x, box->x + box->width);
}
I then built mesa similar to the README.rst
instructions (mkdir build && cd build && meson .. && ninja
) and replaced /usr/lib/dri/radeonsi_dri.so
with src/gallium/targets/dri/radeonsi_dri.so
of the build directory.
With that I was able to play Alien Isolation for tens of minutes, and with that fputs()
line I also saw moments in the terminal output where Alien Isolation would have crashed.