PRIME render offloading broken
Before submitting your bug report:
- Check if a new version of Mesa is available which might have fixed the problem.
- If you can, check if the latest development version (git master) works better.
- Check if your bug has already been reported here.
- For any logs, backtraces, etc - use code blocks
- As examples of good bug reports you may review one of these - #2598 (closed), #2615 (closed), #2608 (closed)
Otherwise, please fill the requested information below. And please remove anything that doesn't apply to keep things readable :)
System information
Please post inxi -GSC -xx
output (fenced with triple backticks) OR fill information below manually
System: Host: toaster Kernel: 5.10.6-110-tkg-cfs x86_64 bits: 64 compiler: gcc v: 10.2.1
Desktop: MATE 1.24.1 wm: marco dm: GDM Distro: Artix Linux base: Arch Linux
CPU: Info: Quad Core model: AMD A10-7800 Radeon R7 12 Compute Cores 4C+8G bits: 64 type: MCP
arch: Steamroller rev: 1 L1 cache: 256 KiB L2 cache: 2 MiB
flags: avx lm nx pae sse sse2 sse3 sse4_1 sse4_2 sse4a ssse3 svm bogomips: 27978
Speed: 3133 MHz min/max: 1400/3500 MHz boost: enabled Core speeds (MHz): 1: 3133 2: 3133
3: 3106 4: 3105
Graphics: Device-1: AMD Kaveri [Radeon R7 Graphics] vendor: Gigabyte driver: amdgpu v: kernel
bus ID: 00:01.0 chip ID: 1002:130f
Device-2: AMD Oland XT [Radeon HD 8670 / R7 250/350] vendor: Dell driver: amdgpu v: kernel
bus ID: 01:00.0 chip ID: 1002:6610
Display: server: X.Org 1.20.10 compositor: marco driver: loaded: amdgpu,ati
unloaded: fbdev,modesetting,vesa resolution: 1920x1080~60Hz s-dpi: 96
OpenGL: renderer: AMD KAVERI (DRM 3.40.0 5.10.6-110-tkg-cfs LLVM 11.0.1)
v: 4.6 Mesa 21.1.0-devel (git-b60dfa2c09) direct render: Yes
Describe the issue
Please describe what you are doing, what you expect and what you're seeing instead. How frequent is the issue? Is it a one time occurrence? Does it appear multiple times but randomly? Can you easily reproduce it?
"It doesn't work" usually is not a helpful description of an issue. The more detail about how things are going wrong, the better.
When launching any graphical program using
DRI_PRIME=1
, (including simple programs likeglxgears
) the second GPU will fail and then reset itself, causing graphical glitches which later result in the application crashing.Simple programs like
glxinfo
andclinfo
will not result in any GPU crashes, only OpenGL/EGL is affected asvkcube
works perfectly fine.Applications launched without
DRI_PRIME
or on the primary GPU work just fine without any issues.
Regression
Did it used to work? It can greatly help to know when the issue started.
The last known working build of Mesa was commit 205e737f, with MR !8794 (merged) and newer being affected.
Log files as attachment
- Output of
dmesg
See attachment.
- Backtrace
#0 0x00007ffff6b824f6 in si_emit_draw_packets<(chip_class)8, (si_has_ngg)0, (si_has_prim_discard_cs)0> (
sctx=sctx@entry=0x55555557e120, info=info@entry=0x555555718618, indirect=<optimized out>, indirect@entry=0x0,
draws=draws@entry=0x555555718638, num_draws=num_draws@entry=1, indexbuf=<optimized out>, index_size=4, index_offset=0,
instance_count=1, original_index_size=4, dispatch_prim_discard_cs=false)
at ../mesa/src/gallium/drivers/radeonsi/si_state_draw.cpp:1210
#1 0x00007ffff6bc863f in si_draw_vbo<(chip_class)8, (si_has_tess)0, (si_has_gs)0, (si_has_ngg)0, (si_has_prim_discard_cs)0> (
ctx=<optimized out>, info=<optimized out>, indirect=0x0, draws=0x555555718638, num_draws=1)
at ../mesa/src/gallium/drivers/radeonsi/si_state_draw.cpp:2167
#2 0x00007ffff68634c6 in tc_call_draw_single (pipe=<optimized out>, payload=0x555555718618)
at ../mesa/src/gallium/auxiliary/util/u_threaded_context.c:2341
#3 0x00007ffff6865aab in tc_batch_execute (job=job@entry=0x555555718400, thread_index=thread_index@entry=0)
at ../mesa/src/gallium/auxiliary/util/u_threaded_context.c:209
#4 0x00007ffff6866951 in _tc_sync (tc=tc@entry=0x555555717e50, func=<optimized out>, info=<optimized out>)
at ../mesa/src/gallium/auxiliary/util/u_threaded_context.c:325
#5 0x00007ffff6866ac0 in tc_flush (_pipe=0x555555717e50, fence=0x7fffffffdc68, flags=1)
at ../mesa/src/gallium/auxiliary/util/u_threaded_context.c:2317
#6 0x00007ffff5fe6666 in st_context_flush (stctxi=0x555555780c10, flags=3, fence=0x7fffffffdc68,
before_flush_cb=0x7ffff5f621f0 <notify_before_flush_cb>, args=0x7fffffffdc70)
at ../mesa/src/mesa/state_tracker/st_manager.c:674
#7 0x00007ffff5f63280 in dri_flush (cPriv=<optimized out>, dPriv=<optimized out>, flags=<optimized out>, reason=<optimized out>)
at ../mesa/src/gallium/frontends/dri/dri_drawable.c:526
#8 0x00007ffff794f501 in loader_dri3_swap_buffers_msc (draw=0x55555579a6b8, target_msc=0, divisor=0, remainder=0,
flush_flags=<optimized out>, rects=rects@entry=0x0, n_rects=0, force_copy=false)
at ../mesa/src/loader/loader_dri3_helper.c:959
#9 0x00007ffff79418a2 in dri3_swap_buffers (pdraw=<optimized out>, target_msc=<optimized out>, divisor=<optimized out>,
remainder=<optimized out>, flush=<optimized out>) at ../mesa/src/glx/dri3_glx.c:589
#10 0x0000555555556640 in ?? ()
#11 0x00007ffff7ac3152 in __libc_start_main () from /usr/lib/libc.so.6
#12 0x0000555555556b7e in ?? ()
- Gpu hang details
[ 432.596178] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=4, emitted seq=8
[ 432.596388] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process glxgears pid 3674 thread glxgears:cs0 pid 3675
[ 432.596395] amdgpu 0000:01:00.0: amdgpu: GPU reset begin!
[ 433.079118] amdgpu 0000:01:00.0: amdgpu: PCI CONFIG reset
[ 433.079122] amdgpu 0000:01:00.0: amdgpu: GPU pci config reset
[ 433.085405] amdgpu 0000:01:00.0: amdgpu: GPU reset succeeded, trying to resume
[ 433.085493] [drm] PCIE gen 3 link speeds already enabled
(amdgpu.gpu_recovery and amdgpu.noretry is set in order to allow GPU recovery)
Screenshots/video files (if applicable)
See attachments.
These show glxgears and eglgears after the system unfreezes from its dGPU reset before the applications crash.
Any extra information would be greatly appreciated
During the GPU crash,
amdgpu: amdgpu_cs_query_fence_status failed.
is output multiple times.The kernel driver (amdgpu) claims that the GFX ring has timed out.[dmesg-glxgears.txt]