SIGBUS with amdgpu on multi-GPU system on X server with DRI3/GBM
Reported this bug to the kernel, and got directed to here: https://bugzilla.kernel.org/show_bug.cgi?id=218993
I ran into a SIGBUS when using multiple GPUs and DRI with an X server that has GPU acceleration (TigerVNC's Xvnc). This happened on a machine with: OS: Fedora 40 running 6.9.5-200.fc40.x86_64 iGPU: Ryzen 5 7600 dGPU: RTX 4060 | Arc A380 | RX 7600
The issue occurs when the X server is configured to use an AMD rendernode, and an application wants to use a non-AMD rendernode.
When opening the AMD rendernode using gbm_create_device()
, a SIGBUS will occur
when gbm_bo_map()
is called, if the application wants to use another rendernode
that is not an AMD GPU.
In my setup, /dev/dri/renderD128
is the AMD iGPU, and /dev/dri/renderD129
is an
RTX 4060.
If I run the X server with
$ Xvnc :50 -rendernode /dev/dri/renderD128
and vkcube with renderD129 on the X server:
$ DISPLAY=:50 vkcube --gpu_number 1
I get the sigbus:
(EE)
(EE) Backtrace:
(EE) 0: Xvnc (xorg_backtrace+0x82) [0x560c52b47d42]
(EE) 1: Xvnc (0x560c52991000+0x1b7f4c) [0x560c52b48f4c]
(EE) 2: /lib64/libc.so.6 (0x7f0c99613000+0x40710) [0x7f0c99653710]
(EE) 3: /lib64/libpixman-1.so.0 (0x7f0c99ed0000+0x8a2d0) [0x7f0c99f5a2d0]
(EE) 4: /lib64/libpixman-1.so.0 (pixman_blt+0x81) [0x7f0c99ede8d1]
(EE) 5: Xvnc (vncDRI3SyncPixmapFromGPU+0x10e) [0x560c529f303e]
(EE) 6: Xvnc (0x560c52991000+0x622c3) [0x560c529f32c3]
(EE) 7: Xvnc (dri3_pixmap_from_fds+0xcf) [0x560c52a7fdaf]
(EE) 8: Xvnc (0x560c52991000+0xf1309) [0x560c52a82309]
(EE) 9: Xvnc (Dispatch+0x426) [0x560c52ae3f56]
(EE) 10: Xvnc (dix_main+0x46a) [0x560c52af2d4a]
(EE) 11: /lib64/libc.so.6 (0x7f0c99613000+0x2a088) [0x7f0c9963d088]
(EE) 12: /lib64/libc.so.6 (__libc_start_main+0x8b) [0x7f0c9963d14b]
(EE) 13: Xvnc (_start+0x25) [0x560c529eed75]
(EE)
(EE) Bus error at address 0x7f0c8e211000
(EE)
Fatal server error:
(EE) Caught signal 7 (Bus error). Server aborting
(EE)
Aborted (core dumped)
The same crash occurs when running vkcube on an Arc GPU (A380).
However, running the X server on an Arc or Nvidia GPU, and vkcube on the AMD GPU, does not cause a crash. Neither does running the X server on AMD, and vkcube on a different AMD GPU (iGPU & RX 7600 for example).
To clarify, the crash does not come from gbm_bo_map()
directly, but by the incorrectly mapped memory which causes a crash later in the program.
I've attached a stacktrace with the last call to mmap()
before the crash:
stacktrace
Thread 1 "Xvnc" hit Breakpoint 1.1, __GI___mmap64 (addr=addr@entry=0x0, len=1024000, prot=prot@entry=3, flags=flags@entry=1, fd=9, offset=4301266944) at ../sysdeps/unix/sysv/linux/mmap64.c:47
47 {
#0 __GI___mmap64 (addr=addr@entry=0x0, len=1024000, prot=prot@entry=3, flags=flags@entry=1, fd=9, offset=4301266944) at ../sysdeps/unix/sysv/linux/mmap64.c:47
#1 0x00007f4c3c131af0 in amdgpu_bo_cpu_map (bo=0x560732eb4a70, cpu=cpu@entry=0x7ffc013676e0) at ../amdgpu/amdgpu_bo.c:458
#2 0x00007f4c39f41506 in amdgpu_bo_do_map (rws=rws@entry=0x560732b41d20, bo=bo@entry=0x560732eb4b00, cpu=cpu@entry=0x7ffc013676e0) at ../src/gallium/winsys/amdgpu/drm/amdgpu_bo.c:261
#3 0x00007f4c39f4303b in amdgpu_bo_map (rws=0x560732b41d20, buf=<optimized out>, rcs=<optimized out>, usage=<optimized out>) at ../src/gallium/winsys/amdgpu/drm/amdgpu_bo.c:400
#4 0x00007f4c39edb160 in si_texture_transfer_map (ctx=0x560732eb5310, texture=0x560732eb4b80, level=<optimized out>, usage=1, box=<optimized out>, ptransfer=<optimized out>) at ../src/gallium/drivers/radeonsi/si_texture.c:1996
#5 0x00007f4c39637258 in pipe_texture_map (context=<optimized out>, resource=<optimized out>, level=0, layer=0, access=1, x=0, y=0, w=500, h=500, transfer=0x7ffc01367848) at ../src/gallium/auxiliary/util/u_inlines.h:555
#6 dri2_map_image (context=0x560732eb5170, image=0x560732eb4a10, x0=0, y0=0, width=500, height=500, flags=1, stride=0x7ffc0136796c, data=0x7ffc01367970) at ../src/gallium/frontends/dri/dri2.c:1922
#7 0x00007f4c3d429e43 in gbm_dri_bo_map (_bo=0x560732eb5120, x=0, y=0, width=500, height=500, flags=1, stride=0x7ffc0136796c, map_data=0x7ffc01367970) at ../src/gbm/backends/dri/gbm_dri.c:1035
#8 0x000056072a416fc5 in vncDRI3SyncPixmapFromGPU (pixmap=pixmap@entry=0x7f4c3190b010) at /usr/src/debug/tigervnc-dri-1.fc40.x86_64/unix/xserver/hw/vnc/vncDRI3.c:358
#9 0x000056072a4172c3 in vncPixmapFromFd (screen=0x560732ad7cb0, fd=<optimized out>, width=<optimized out>, height=<optimized out>, stride=<optimized out>, depth=24 '\030', bpp=32 ' ') at /usr/src/debug/tigervnc-dri-1.fc40.x86_64/unix/xserver/hw/vnc/vncDRI3.c:157
#10 0x000056072a4a3daf in dri3_pixmap_from_fds (ppixmap=0x7ffc01367a90, screen=<optimized out>, num_fds=<optimized out>, fds=<optimized out>, width=<optimized out>, height=<optimized out>, strides=0x7ffc01367a88, offsets=0x7ffc01367a84, depth=24 '\030', bpp=32 ' ', modifier=72057594037927935) at ../../dri3/dri3_screen.c:66
#11 0x000056072a4a1be1 in proc_dri3_pixmap_from_buffer (client=0x560732c7c180) at ../../dri3/dri3_request.c:204
#12 0x000056072a507f56 in Dispatch () at ../../dix/dispatch.c:478
#13 0x000056072a516d4a in dix_main (argc=4, argv=0x7ffc01367cf8, envp=<optimized out>) at ../../dix/main.c:276
#14 0x00007f4c3cc3d088 in __libc_start_call_main (main=main@entry=0x56072a4118f0 <main>, argc=argc@entry=4, argv=argv@entry=0x7ffc01367cf8) at ../sysdeps/nptl/libc_start_call_main.h:58
#15 0x00007f4c3cc3d14b in __libc_start_main_impl (main=0x56072a4118f0 <main>, argc=4, argv=0x7ffc01367cf8, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffc01367ce8) at ../csu/libc-start.c:360
#16 0x000056072a412d75 in _start ()
Continuing.
Thread 1 "Xvnc" received signal SIGBUS, Bus error.
sse2_blt.part.0.lto_priv.0 (src_bits=<optimized out>, dst_bits=<optimized out>, src_stride=<optimized out>, dst_stride=<optimized out>, src_bpp=<optimized out>, src_x=<optimized out>, src_y=0, dest_x=0, dest_y=0, width=500, height=499, dst_bpp=<optimized out>, imp=<optimized out>) at ../pixman/pixman-sse2.c:4752
4752 xmm0 = load_128_unaligned ((__m128i*)(s));
#0 sse2_blt.part.0.lto_priv.0 (src_bits=<optimized out>, dst_bits=<optimized out>, src_stride=<optimized out>, dst_stride=<optimized out>, src_bpp=<optimized out>, src_x=<optimized out>, src_y=0, dest_x=0, dest_y=0, width=500, height=499, dst_bpp=<optimized out>, imp=<optimized out>) at ../pixman/pixman-sse2.c:4752
#1 0x00007f4c3d4fa8d1 in _pixman_implementation_blt (imp=0x560732aab8b0, src_bits=0x7f4c31811000, dst_bits=0x7f4c3190b070, src_stride=512, dst_stride=500, src_bpp=32, dst_bpp=32, src_x=0, src_y=0, dest_x=0, dest_y=0, width=500, height=500) at ../pixman/pixman-implementation.c:250
#2 pixman_blt (src_bits=0x7f4c31811000, dst_bits=0x7f4c3190b070, src_stride=512, dst_stride=500, src_bpp=32, dst_bpp=32, src_x=0, src_y=0, dest_x=0, dest_y=0, width=500, height=500) at ../pixman/pixman.c:741
#3 0x000056072a41703e in vncDRI3SyncPixmapFromGPU (pixmap=pixmap@entry=0x7f4c3190b010) at /usr/src/debug/tigervnc-dri-1.fc40.x86_64/unix/xserver/hw/vnc/vncDRI3.c:377
#4 0x000056072a4172c3 in vncPixmapFromFd (screen=0x560732ad7cb0, fd=<optimized out>, width=<optimized out>, height=<optimized out>, stride=<optimized out>, depth=24 '\030', bpp=32 ' ') at /usr/src/debug/tigervnc-dri-1.fc40.x86_64/unix/xserver/hw/vnc/vncDRI3.c:157
#5 0x000056072a4a3daf in dri3_pixmap_from_fds (ppixmap=0x7ffc01367a90, screen=<optimized out>, num_fds=<optimized out>, fds=<optimized out>, width=<optimized out>, height=<optimized out>, strides=0x7ffc01367a88, offsets=0x7ffc01367a84, depth=24 '\030', bpp=32 ' ', modifier=72057594037927935) at ../../dri3/dri3_screen.c:66
#6 0x000056072a4a1be1 in proc_dri3_pixmap_from_buffer (client=0x560732c7c180) at ../../dri3/dri3_request.c:204
#7 0x000056072a507f56 in Dispatch () at ../../dix/dispatch.c:478
#8 0x000056072a516d4a in dix_main (argc=4, argv=0x7ffc01367cf8, envp=<optimized out>) at ../../dix/main.c:276
#9 0x00007f4c3cc3d088 in __libc_start_call_main (main=main@entry=0x56072a4118f0 <main>, argc=argc@entry=4, argv=argv@entry=0x7ffc01367cf8) at ../sysdeps/nptl/libc_start_call_main.h:58
#10 0x00007f4c3cc3d14b in __libc_start_main_impl (main=0x56072a4118f0 <main>, argc=4, argv=0x7ffc01367cf8, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffc01367ce8) at ../csu/libc-start.c:360
#11 0x000056072a412d75 in _start ()
Continuing.