AMD 7840U/780M, Linux 6.10.7: GPU + Xorg crash (but system continues to work): `*ERROR* ring gfx_0.0.0 timeout, signaled seq=3907213, emitted seq=3907215`, `[drm:gfx11_kiq_unmap_queues [amdgpu]] *ERROR* failed to unmap legacy queue`, and more
Ahoj,
while the system was on general high load (compiling stuff), I got
- A screen freeze, then
- screen black, then
- screen re-appeared but with an image that was there short before (I was typing text and a few letters were missing) and
- non-respinsive in the GUI.
Background jobs did still run, and I could connect to the machine via SSH.
- The kernel messages are:
[...] [57957.223424] [T12200] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, signaled seq=3907213, emitted seq=3907215 [57957.223802] [T12200] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process firefox pid 11747 thread firefox:cs0 pid 11831 [57957.224164] [T12200] amdgpu 0000:65:00.0: amdgpu: GPU reset begin! [57959.302788] [T12200] amdgpu 0000:65:00.0: amdgpu: MES failed to respond to msg=REMOVE_QUEUE [57959.302799] [T12200] [drm:gfx11_kiq_unmap_queues [amdgpu]] *ERROR* failed to unmap legacy queue [57959.516874] [T12200] [drm:gfx_v11_0_hw_fini [amdgpu]] *ERROR* failed to halt cp gfx [57959.518486] [T12200] amdgpu 0000:65:00.0: amdgpu: MODE2 reset [57959.555997] [T12200] amdgpu 0000:65:00.0: amdgpu: GPU reset succeeded, trying to resume [57959.556524] [T12200] [drm] PCIE GART of 512M enabled (table at 0x0000008000900000). [57959.556829] [T12200] [drm] VRAM is lost due to GPU reset! [57959.556836] [T12200] amdgpu 0000:65:00.0: amdgpu: SMU is resuming... [57959.560174] [T12200] amdgpu 0000:65:00.0: amdgpu: SMU is resumed successfully! [57959.562112] [T12200] [drm] DMUB hardware initialized: version=0x08004000 [57960.417359] [T12200] [drm] kiq ring mec 3 pipe 1 q 0 [57960.420667] [T12200] amdgpu 0000:65:00.0: [drm:jpeg_v4_0_hw_init [amdgpu]] JPEG decode initialized successfully. [57960.421580] [T12200] amdgpu 0000:65:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0 [57960.421588] [T12200] amdgpu 0000:65:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0 [57960.421594] [T12200] amdgpu 0000:65:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0 [57960.421599] [T12200] amdgpu 0000:65:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 6 on hub 0 [57960.421604] [T12200] amdgpu 0000:65:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 7 on hub 0 [57960.421608] [T12200] amdgpu 0000:65:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 8 on hub 0 [57960.421612] [T12200] amdgpu 0000:65:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 9 on hub 0 [57960.421616] [T12200] amdgpu 0000:65:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 10 on hub 0 [57960.421622] [T12200] amdgpu 0000:65:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 11 on hub 0 [57960.421626] [T12200] amdgpu 0000:65:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0 [57960.421631] [T12200] amdgpu 0000:65:00.0: amdgpu: ring vcn_unified_0 uses VM inv eng 0 on hub 8 [57960.421636] [T12200] amdgpu 0000:65:00.0: amdgpu: ring jpeg_dec uses VM inv eng 1 on hub 8 [57960.421641] [T12200] amdgpu 0000:65:00.0: amdgpu: ring mes_kiq_3.1.0 uses VM inv eng 13 on hub 0 [57960.423829] [T12200] amdgpu 0000:65:00.0: amdgpu: recover vram bo from shadow start [57960.423836] [T12200] amdgpu 0000:65:00.0: amdgpu: recover vram bo from shadow done [57960.424069] [T12200] amdgpu 0000:65:00.0: amdgpu: GPU reset(2) succeeded! [57960.426812] [T11831] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
dmesg.log
- In the
/var/log/Xorg.0.log
I find:[...] [ 57957.124] (EE) [ 57957.124] (EE) Backtrace: [ 57957.125] (EE) unw_get_proc_name failed: no unwind info found [-10] [ 57957.125] (EE) 0: /usr/lib/Xorg (?+0x0) [0x5570e1c9f11d] [ 57957.128] (EE) unw_get_proc_name failed: no unwind info found [-10] [ 57957.128] (EE) 1: /usr/lib/libc.so.6 (?+0x0) [0x7f5f7f552b80] [ 57957.130] (EE) unw_get_proc_name failed: no unwind info found [-10] [ 57957.131] (EE) 2: /usr/lib/libc.so.6 (?+0x0) [0x7f5f7f5a8d0c] [ 57957.133] (EE) 3: /usr/lib/libc.so.6 (gsignal+0x18) [0x7f5f7f552ad8] [ 57957.135] (EE) 4: /usr/lib/libc.so.6 (abort+0xd7) [0x7f5f7f53a4bb] [ 57957.141] (EE) unw_get_proc_name failed: no unwind info found [-10] [ 57957.141] (EE) 5: /usr/lib/libgallium-24.2.2-arch1.1.so (?+0x0) [0x7f5f7cb2d5f7] [ 57957.142] (EE) unw_get_proc_name failed: no unwind info found [-10] [ 57957.143] (EE) 6: /usr/lib/libgallium-24.2.2-arch1.1.so (?+0x0) [0x7f5f7cb309b3] [ 57957.144] (EE) unw_get_proc_name failed: no unwind info found [-10] [ 57957.144] (EE) 7: /usr/lib/libgallium-24.2.2-arch1.1.so (?+0x0) [0x7f5f7c281d8c] [ 57957.146] (EE) unw_get_proc_name failed: no unwind info found [-10] [ 57957.146] (EE) 8: /usr/lib/libgallium-24.2.2-arch1.1.so (?+0x0) [0x7f5f7c2a49fc] [ 57957.148] (EE) unw_get_proc_name failed: no unwind info found [-10] [ 57957.148] (EE) 9: /usr/lib/libc.so.6 (?+0x0) [0x7f5f7f5a6eaa] [ 57957.150] (EE) unw_get_proc_name failed: no unwind info found [-10] [ 57957.151] (EE) 10: /usr/lib/libc.so.6 (?+0x0) [0x7f5f7f62817c] [ 57957.151] (EE) [ 57957.151] (EE) Fatal server error: [ 57957.151] (EE) Caught signal 6 (Aborted). Server aborting [ 57957.151] (EE) [ 57957.151] (EE) Please consult the The X.Org Foundation support at http://wiki.x.org for help. [ 57957.151] (EE) Please also check the log file at "/var/log/Xorg.0.log" for additional information. [ 57957.151] (EE) [ 57957.151] (II) AIGLX: Suspending AIGLX clients for VT switch
Xorg.0.log
- My system is:
- GPD Win Max 2 (2023) with AMD 7840U SoC (GPU: AMD Ryzen 780M)
-
/sys/devices/virtual/dmi/id/board_name
:G1619-04
-
/sys/devices/virtual/dmi/id/bios_version
:0.41
-
/sys/devices/virtual/dmi/id/modalias
:dmi:bvnAmericanMegatrendsInternational,LLC.:bvr0.41:bd07/26/2024:br0.41:efr0.15:svnGPD:pnG1619-04:pvrVer.1.0:rvnGPD:rnG1619-04:rvrVer.1.0:cvnSU:ct10:cvrDefaultstring:sku7896559068635:
-
/sys/devices/pci0000:00/0000:00:08.1/0000:65:00.0/vbios_version
:113-PHXGENERIC-001
-
/sys/devices/pci0000:00/0000:00:08.1/0000:65:00.0/fw_version/...
:-
.../asd_fw_version
:0x210000df
-
.../imu_fw_version
:0x0b012d00
-
.../mec_fw_version
:0x00000027
-
.../me_fw_version
:0x00000027
-
.../mes_fw_version
:0x00000062
-
.../mes_kiq_fw_version
:0x00000073
-
.../pfp_fw_version
:0x00000030
-
.../rlc_fw_version
:0x00000080
-
.../sdma_fw_version
:0x00000015
-
.../smc_fw_version
:0x004c5400
-
.../vcn_fw_version
:0x08116003
-
-
hwinfo
output:hwinfo.txt
- Kernel: 6.10.7-xanmod-customconfig-clang, kernel configuration:
kernel-config.txt
-
/sys/module/amdgpu/parameters/...
:-
.../abmlevel
:0
-
.../agp
:-1
-
.../aspm
:1
-
.../async_gfx_ring
:1
-
.../audio
:1
-
.../backlight
:-1
-
.../bad_page_threshold
:-1
-
.../bapm
:1
-
.../cg_mask
:18446744073709551615
-
.../cik_support
:0
-
.../compute_multipipe
:-1
-
.../cwsr_enable
:1
-
.../damageclips
:-1
-
.../dc
:-1
-
.../dcdebugmask
:0
-
.../dcfeaturemask
:2
-
.../debug_evictions
:N
-
.../debug_mask
:0
-
.../deep_color
:0
-
.../disable_cu
:(null)
-
.../discovery
:-1
-
.../disp_priority
:0
-
.../dpm
:1
-
.../emu_mode
:0
-
.../enforce_isolation
:N
-
.../exp_hw_support
:0
-
.../force_asic_type
:-1
-
.../forcelongtraining
:0
-
.../freesync_video
:1
-
.../fw_load_type
:-1
-
.../gartsize
:4294967295
-
.../gpu_recovery
:-1
-
.../gttsize
:-1
-
.../halt_if_hws_hang
:0
-
.../hw_i2c
:1
-
.../hws_gws_support
:N
-
.../hws_max_conc_proc
:-1
-
.../ignore_min_pcap
:1
-
.../ip_block_mask
:4294967295
-
.../lbpw
:1
-
.../lockup_timeout
: `` -
.../max_num_of_queues_per_device
:4096
-
.../mcbp
:-1
-
.../mes
:1
-
.../mes_kiq
:1
-
.../mes_log_enable
:0
-
.../moverate
:-1
-
.../msi
:1
-
.../mtype_local
:0
-
.../no_queue_eviction_on_vm_fault
:0
-
.../noretry
:-1
-
.../no_system_mem_limit
:N
-
.../num_kcq
:-1
-
.../pcie_gen2
:-1
-
.../pcie_gen_cap
:0
-
.../pcie_lane_cap
:0
-
.../pg_mask
:4294967295
-
.../ppfeaturemask
:0xffffffff
-
.../queue_preemption_timeout_ms
:9000
-
.../ras_enable
:1
-
.../ras_mask
:4294967295
-
.../reset_method
:-1
-
.../runpm
:-1
-
.../sched_hw_submission
:2
-
.../sched_jobs
:32
-
.../sched_policy
:0
-
.../sdma_phase_quantum
:32
-
.../seamless
:1
-
.../send_sigterm
:0
-
.../sg_display
:-1
-
.../si_support
:0
-
.../smu_memory_pool_size
:0
-
.../smu_pptable_id
:-1
-
.../timeout_fatal_disable
:N
-
.../timeout_period
:0
-
.../tmz
:-1
-
.../umsch_mm
:0
-
.../user_partt_mode
:4294967294
-
.../use_xgmi_p2p
:1
-
.../vcnfw_log
:0
-
.../virtual_display
:(null)
-
.../visualconfirm
:0
-
.../vis_vramlimit
:0
-
.../vm_block_size
:-1
-
.../vm_fault_stop
:0
-
.../vm_fragment_size
:-1
-
.../vm_size
:-1
-
.../vm_update_mode
:-1
-
.../vramlimit
:-1
-
.../wbrf
:-1
-
Regards!