radeonsi: Consistent GPU hangs in Teardown
System information
System: Host: lilypad Kernel: 5.9.1-7-tkg-cfs x86_64 bits: 64 compiler: N/A
Desktop: Budgie 10.5.1 wm: budgie-wm dm: GDM Distro: Arch Linux
CPU: Info: 12-Core model: AMD Ryzen 9 3900X bits: 64 type: MT MCP arch: Zen 2
L2 cache: 6144 KiB
flags: avx avx2 lm nx pae sse sse2 sse3 sse4_1 sse4_2 sse4a ssse3 svm
bogomips: 182249
Speed: 3404 MHz min/max: 2200/3800 MHz Core speeds (MHz): 1: 2379 2: 1863
3: 4176 4: 2103 5: 2739 6: 1863 7: 2795 8: 3720 9: 2072 10: 3478 11: 2195
12: 2196 13: 2794 14: 2195 15: 3250 16: 4240 17: 2104 18: 3591 19: 2053
20: 3562 21: 2196 22: 2195 23: 2196 24: 2220
Graphics: Device-1: AMD Vega 20 [Radeon VII] driver: amdgpu v: kernel bus ID: 0a:00.0
chip ID: 1002:66af
Device-2: NVIDIA TU106 [GeForce RTX 2060 SUPER] driver: vfio-pci v: 0.2
bus ID: 0b:00.0 chip ID: 10de:1f06
Display: x11 server: X.Org 1.20.9 compositor: budgie-wm driver: amdgpu
unloaded: modesetting alternate: ati,fbdev,vesa resolution: 1: 3840x2160~60Hz
2: 1920x1080~60Hz s-dpi: 96
OpenGL:
renderer: AMD Radeon VII (VEGA20 DRM 3.39.0 5.9.1-7-tkg-cfs LLVM 12.0.0)
v: 4.6 Mesa 20.3.0-devel (git-e07c546763) direct render: Yes
If applicable
- Wine/Proton version: 5.13
Describe the issue
Playing Teardown for extended periods of time results in random GPU hangs. The game uses OpenGL and is being played through Proton.
To reproduce:
- Go to the Marina level (Level 3)
- Try to complete the mission and side objectives
- Eventually, a GPU hang will happen on this level
The game is available here: https://store.steampowered.com/app/1167630/Teardown/
AMDGPU Kernel Output:
Oct 30 02:00:28.763939 lilypad kernel: [drm] amdgpu kernel modesetting enabled.
Oct 30 02:00:28.763970 lilypad kernel: amdgpu: Ignoring ACPI CRAT on non-APU system
Oct 30 02:00:28.764015 lilypad kernel: amdgpu: Topology: Add CPU node
Oct 30 02:00:28.764062 lilypad kernel: fb0: switching to amdgpudrmfb from EFI VGA
Oct 30 02:00:28.765353 lilypad kernel: amdgpu 0000:0a:00.0: vgaarb: deactivate vga console
Oct 30 02:00:28.765552 lilypad kernel: amdgpu 0000:0a:00.0: enabling device (0006 -> 0007)
Oct 30 02:00:28.765736 lilypad kernel: amdgpu 0000:0a:00.0: amdgpu: Trusted Memory Zone (TMZ) feature not supported
Oct 30 02:00:28.766115 lilypad kernel: amdgpu: ATOM BIOS: 113-D3600200-106
Oct 30 02:00:28.766217 lilypad kernel: amdgpu 0000:0a:00.0: amdgpu: HBM ECC is not presented.
Oct 30 02:00:28.766390 lilypad kernel: amdgpu 0000:0a:00.0: amdgpu: SRAM ECC is not presented.
Oct 30 02:00:28.766578 lilypad kernel: amdgpu 0000:0a:00.0: amdgpu: VRAM: 16368M 0x0000008000000000 - 0x00000083FEFFFFFF (16368M used)
Oct 30 02:00:28.766747 lilypad kernel: amdgpu 0000:0a:00.0: amdgpu: GART: 512M 0x0000000000000000 - 0x000000001FFFFFFF
Oct 30 02:00:28.766901 lilypad kernel: amdgpu 0000:0a:00.0: amdgpu: AGP: 267894784M 0x0000008400000000 - 0x0000FFFFFFFFFFFF
Oct 30 02:00:28.767175 lilypad kernel: [drm] amdgpu: 16368M of VRAM memory ready
Oct 30 02:00:28.767192 lilypad kernel: [drm] amdgpu: 16368M of GTT memory ready.
Oct 30 02:00:28.772135 lilypad kernel: amdgpu: hwmgr_sw_init smu backed is vega20_smu
Oct 30 02:00:29.494592 lilypad kernel: amdgpu 0000:0a:00.0: amdgpu: HDCP: optional hdcp ta ucode is not available
Oct 30 02:00:29.494807 lilypad kernel: amdgpu 0000:0a:00.0: amdgpu: DTM: optional dtm ta ucode is not available
Oct 30 02:00:29.511926 lilypad kernel: snd_hda_intel 0000:0a:00.1: bound 0000:0a:00.0 (ops amdgpu_dm_audio_component_bind_ops [amdgpu])
Oct 30 02:00:29.962802 lilypad kernel: amdgpu: Topology: Add dGPU node [0x66af:0x1002]
Oct 30 02:00:29.962933 lilypad kernel: amdgpu 0000:0a:00.0: amdgpu: SE 4, SH per SE 1, CU per SH 16, active_cu_number 60
Oct 30 02:00:29.968060 lilypad kernel: fbcon: amdgpudrmfb (fb0) is primary device
Oct 30 02:00:30.135950 lilypad kernel: amdgpu 0000:0a:00.0: [drm] fb0: amdgpudrmfb frame buffer device
Oct 30 02:00:30.149714 lilypad kernel: amdgpu 0000:0a:00.0: amdgpu: ring gfx uses VM inv eng 0 on hub 0
Oct 30 02:00:30.149949 lilypad kernel: amdgpu 0000:0a:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
Oct 30 02:00:30.150125 lilypad kernel: amdgpu 0000:0a:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
Oct 30 02:00:30.150296 lilypad kernel: amdgpu 0000:0a:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 5 on hub 0
Oct 30 02:00:30.150474 lilypad kernel: amdgpu 0000:0a:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 6 on hub 0
Oct 30 02:00:30.150655 lilypad kernel: amdgpu 0000:0a:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 7 on hub 0
Oct 30 02:00:30.150845 lilypad kernel: amdgpu 0000:0a:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 8 on hub 0
Oct 30 02:00:30.150998 lilypad kernel: amdgpu 0000:0a:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 9 on hub 0
Oct 30 02:00:30.151150 lilypad kernel: amdgpu 0000:0a:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 10 on hub 0
Oct 30 02:00:30.151299 lilypad kernel: amdgpu 0000:0a:00.0: amdgpu: ring kiq_2.1.0 uses VM inv eng 11 on hub 0
Oct 30 02:00:30.151421 lilypad kernel: amdgpu 0000:0a:00.0: amdgpu: ring sdma0 uses VM inv eng 0 on hub 1
Oct 30 02:00:30.151543 lilypad kernel: amdgpu 0000:0a:00.0: amdgpu: ring page0 uses VM inv eng 1 on hub 1
Oct 30 02:00:30.151662 lilypad kernel: amdgpu 0000:0a:00.0: amdgpu: ring sdma1 uses VM inv eng 4 on hub 1
Oct 30 02:00:30.151782 lilypad kernel: amdgpu 0000:0a:00.0: amdgpu: ring page1 uses VM inv eng 5 on hub 1
Oct 30 02:00:30.151902 lilypad kernel: amdgpu 0000:0a:00.0: amdgpu: ring uvd_0 uses VM inv eng 6 on hub 1
Oct 30 02:00:30.152026 lilypad kernel: amdgpu 0000:0a:00.0: amdgpu: ring uvd_enc_0.0 uses VM inv eng 7 on hub 1
Oct 30 02:00:30.152142 lilypad kernel: amdgpu 0000:0a:00.0: amdgpu: ring uvd_enc_0.1 uses VM inv eng 8 on hub 1
Oct 30 02:00:30.152370 lilypad kernel: amdgpu 0000:0a:00.0: amdgpu: ring uvd_1 uses VM inv eng 9 on hub 1
Oct 30 02:00:30.152725 lilypad kernel: amdgpu 0000:0a:00.0: amdgpu: ring uvd_enc_1.0 uses VM inv eng 10 on hub 1
Oct 30 02:00:30.153023 lilypad kernel: amdgpu 0000:0a:00.0: amdgpu: ring uvd_enc_1.1 uses VM inv eng 11 on hub 1
Oct 30 02:00:30.153241 lilypad kernel: amdgpu 0000:0a:00.0: amdgpu: ring vce0 uses VM inv eng 12 on hub 1
Oct 30 02:00:30.153392 lilypad kernel: amdgpu 0000:0a:00.0: amdgpu: ring vce1 uses VM inv eng 13 on hub 1
Oct 30 02:00:30.153530 lilypad kernel: amdgpu 0000:0a:00.0: amdgpu: ring vce2 uses VM inv eng 14 on hub 1
Oct 30 02:00:30.153658 lilypad kernel: amdgpu: Detected AMDGPU DF Counters. # of Counters = 4.
Oct 30 02:00:30.153674 lilypad kernel: [drm] Initialized amdgpu 3.39.0 20150101 for 0000:0a:00.0 on minor 0
Oct 30 02:10:23.190457 lilypad kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, but soft recovered
Oct 30 02:20:45.782300 lilypad kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, but soft recovered
Let me know if I can get more information for you -- I am not super familiar with debugging OpenGL so sorry for the overall lack of useful info in the initial bug report.