amdgpu 0000:0b:00.0: amdgpu: [mmhub] page fault after logging into GNOME desktop
I'm on Manjaro using the unstable
channel. About two weeks ago, I switched from the base mesa
packages to mesa-git
for a number of different reasons. About a month before that, I had had issues similar to what I describe below but they only occured after about 15 minutes of playtime in certain games, such as Halo Infinite. This morning, however, my system began experiencing the same problem about 30 seconds to a minute after logging into my GNOME desktop in either Wayland or X11 mode.
Once the issue happens, all of my monitors go black and lose signal and the system becomes nearly completely unresponsive to varying degrees. Caps lock usually still works and about 50% of the time I can SSH in, but either I can't switch TTYs or switching TTYs doesn't restore signal to the monitors. If I SSH in and attempt to reboot, the system will freeze up completely at some point during the reboot process and fail to actually reboot, leaving me with my only option being to perform a hard reset. RADV_DEBUG=hang
is enabled, but it doesn't seem to have any effect at all and fails to produce any output in either my $HOME
directory or /root
.
As the mesa-git
package had been updated the night before, I figured it might be related to that upgrade, so I downgraded all the way to 23.0-rc3 but this made no difference. After messing with a number of settings, including disabling RADV_PERFTEST=gpl
and enabling RADV_DEBUG=nofastclears
and gaining nothing, I rolled all the way back to the stable mesa
packages, which are at version 22.3.4, and this seems, so far, to have fixed it, though I have not yet tried the aforementioned games.
One other detail of potential importance: Steam opened when I logged in, as it usually does, and asked about the Steam Hardware Survey. I clicked through to the point where it collects system data and the crash happened as it collected the data. This happened several times with the same timing. However, the crash also happened if I did not do this or even if I closed Steam immediately after logging in, so this might have been coincidental.
Also, if I switched to a TTY instead of logging in, the crash did not happen despite GDM continuing to run in the background.
System Details
GNOME: 43.2 Kernel: 6.1.8 GPU: RX 6700 XT
Logs
Feb 01 12:15:36 kernel: [drm] amdgpu kernel modesetting enabled.
Feb 01 12:15:36 kernel: amdgpu: Ignoring ACPI CRAT on non-APU system
Feb 01 12:15:36 kernel: amdgpu: Virtual CRAT table created for CPU
Feb 01 12:15:36 kernel: amdgpu: Topology: Add CPU node
Feb 01 12:15:36 kernel: amdgpu 0000:0b:00.0: enabling device (0006 -> 0007)
Feb 01 12:15:36 kernel: amdgpu 0000:0b:00.0: amdgpu: Fetched VBIOS from VFCT
Feb 01 12:15:36 kernel: amdgpu: ATOM BIOS: 113-D5121000_100
Feb 01 12:15:36 kernel: amdgpu 0000:0b:00.0: vgaarb: deactivate vga console
Feb 01 12:15:36 kernel: amdgpu 0000:0b:00.0: amdgpu: Trusted Memory Zone (TMZ) feature disabled as experimental (default)
Feb 01 12:15:36 kernel: amdgpu 0000:0b:00.0: amdgpu: VRAM: 12272M 0x0000008000000000 - 0x00000082FEFFFFFF (12272M used)
Feb 01 12:15:36 kernel: amdgpu 0000:0b:00.0: amdgpu: GART: 512M 0x0000000000000000 - 0x000000001FFFFFFF
Feb 01 12:15:36 kernel: amdgpu 0000:0b:00.0: amdgpu: AGP: 267894784M 0x0000008400000000 - 0x0000FFFFFFFFFFFF
Feb 01 12:15:36 kernel: [drm] amdgpu: 12272M of VRAM memory ready
Feb 01 12:15:36 kernel: [drm] amdgpu: 16005M of GTT memory ready.
Feb 01 12:15:36 kernel: amdgpu 0000:0b:00.0: amdgpu: PSP runtime database doesn't exist
Feb 01 12:15:36 kernel: amdgpu 0000:0b:00.0: amdgpu: PSP runtime database doesn't exist
Feb 01 12:15:36 kernel: amdgpu 0000:0b:00.0: amdgpu: STB initialized to 2048 entries
Feb 01 12:15:36 kernel: amdgpu 0000:0b:00.0: amdgpu: Will use PSP to load VCN firmware
Feb 01 12:15:36 kernel: amdgpu 0000:0b:00.0: amdgpu: RAS: optional ras ta ucode is not available
Feb 01 12:15:36 kernel: amdgpu 0000:0b:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available
Feb 01 12:15:36 kernel: amdgpu 0000:0b:00.0: amdgpu: smu driver if version = 0x0000000e, smu fw if version = 0x00000012, smu fw program = 0, version = 0x00413900 (65.57.0)
Feb 01 12:15:36 kernel: amdgpu 0000:0b:00.0: amdgpu: SMU driver if version not matched
Feb 01 12:15:36 kernel: amdgpu 0000:0b:00.0: amdgpu: use vbios provided pptable
Feb 01 12:15:36 kernel: amdgpu 0000:0b:00.0: amdgpu: SMU is initialized successfully!
Feb 01 12:15:36 kernel: kfd kfd: amdgpu: Allocated 3969056 bytes on gart
Feb 01 12:15:36 kernel: amdgpu: sdma_bitmap: ffff
Feb 01 12:15:36 kernel: amdgpu: HMM registered 12272MB device memory
Feb 01 12:15:36 kernel: amdgpu: SRAT table not found
Feb 01 12:15:36 kernel: amdgpu: Virtual CRAT table created for GPU
Feb 01 12:15:36 kernel: amdgpu: Topology: Add dGPU node [0x73df:0x1002]
Feb 01 12:15:36 kernel: kfd kfd: amdgpu: added device 1002:73df
Feb 01 12:15:36 kernel: amdgpu 0000:0b:00.0: amdgpu: SE 2, SH per SE 2, CU per SH 10, active_cu_number 40
Feb 01 12:15:36 kernel: amdgpu 0000:0b:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
Feb 01 12:15:36 kernel: amdgpu 0000:0b:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
Feb 01 12:15:36 kernel: amdgpu 0000:0b:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
Feb 01 12:15:36 kernel: amdgpu 0000:0b:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 5 on hub 0
Feb 01 12:15:36 kernel: amdgpu 0000:0b:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 6 on hub 0
Feb 01 12:15:36 kernel: amdgpu 0000:0b:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 7 on hub 0
Feb 01 12:15:36 kernel: amdgpu 0000:0b:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 8 on hub 0
Feb 01 12:15:36 kernel: amdgpu 0000:0b:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 9 on hub 0
Feb 01 12:15:36 kernel: amdgpu 0000:0b:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 10 on hub 0
Feb 01 12:15:36 kernel: amdgpu 0000:0b:00.0: amdgpu: ring kiq_2.1.0 uses VM inv eng 11 on hub 0
Feb 01 12:15:36 kernel: amdgpu 0000:0b:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
Feb 01 12:15:36 kernel: amdgpu 0000:0b:00.0: amdgpu: ring sdma1 uses VM inv eng 13 on hub 0
Feb 01 12:15:36 kernel: amdgpu 0000:0b:00.0: amdgpu: ring vcn_dec_0 uses VM inv eng 0 on hub 1
Feb 01 12:15:36 kernel: amdgpu 0000:0b:00.0: amdgpu: ring vcn_enc_0.0 uses VM inv eng 1 on hub 1
Feb 01 12:15:36 kernel: amdgpu 0000:0b:00.0: amdgpu: ring vcn_enc_0.1 uses VM inv eng 4 on hub 1
Feb 01 12:15:36 kernel: amdgpu 0000:0b:00.0: amdgpu: ring jpeg_dec uses VM inv eng 5 on hub 1
Feb 01 12:15:36 kernel: amdgpu 0000:0b:00.0: amdgpu: Using BACO for runtime pm
Feb 01 12:15:36 kernel: [drm] Initialized amdgpu 3.49.0 20150101 for 0000:0b:00.0 on minor 0
Feb 01 12:15:36 kernel: fbcon: amdgpudrmfb (fb0) is primary device
Feb 01 12:15:36 kernel: amdgpu 0000:0b:00.0: [drm] fb0: amdgpudrmfb frame buffer device
Feb 01 12:15:36 kernel: snd_hda_intel 0000:0b:00.1: bound 0000:0b:00.0 (ops amdgpu_dm_audio_component_bind_ops [amdgpu])
Feb 01 12:16:04 kernel: amdgpu 0000:0b:00.0: amdgpu: [mmhub] page fault (src_id:0 ring:40 vmid:1 pasid:32799, for process i386-linux-gnu- pid 5934 thread i386-linux:cs0 pid 5939)
Feb 01 12:16:04 kernel: amdgpu 0000:0b:00.0: amdgpu: in page starting at address 0x0000800100769000 from client 0x12 (VMC)
Feb 01 12:16:04 kernel: amdgpu 0000:0b:00.0: amdgpu: MMVM_L2_PROTECTION_FAULT_STATUS:0x00141651
Feb 01 12:16:04 kernel: amdgpu 0000:0b:00.0: amdgpu: Faulty UTCL2 client ID: VCN0 (0xb)
Feb 01 12:16:04 kernel: amdgpu 0000:0b:00.0: amdgpu: MORE_FAULTS: 0x1
Feb 01 12:16:04 kernel: amdgpu 0000:0b:00.0: amdgpu: WALKER_ERROR: 0x0
Feb 01 12:16:04 kernel: amdgpu 0000:0b:00.0: amdgpu: PERMISSION_FAULTS: 0x5
Feb 01 12:16:04 kernel: amdgpu 0000:0b:00.0: amdgpu: MAPPING_ERROR: 0x0
Feb 01 12:16:04 kernel: amdgpu 0000:0b:00.0: amdgpu: RW: 0x1
Feb 01 12:16:04 kernel: amdgpu 0000:0b:00.0: amdgpu: [mmhub] page fault (src_id:0 ring:40 vmid:1 pasid:32799, for process i386-linux-gnu- pid 5934 thread i386-linux:cs0 pid 5939)
Feb 01 12:16:04 kernel: amdgpu 0000:0b:00.0: amdgpu: in page starting at address 0x000080010076a000 from client 0x12 (VMC)
Feb 01 12:16:04 kernel: amdgpu 0000:0b:00.0: amdgpu: MMVM_L2_PROTECTION_FAULT_STATUS:0x00000000
Feb 01 12:16:04 kernel: amdgpu 0000:0b:00.0: amdgpu: Faulty UTCL2 client ID: unknown (0x0)
Feb 01 12:16:04 kernel: amdgpu 0000:0b:00.0: amdgpu: MORE_FAULTS: 0x0
Feb 01 12:16:04 kernel: amdgpu 0000:0b:00.0: amdgpu: WALKER_ERROR: 0x0
Feb 01 12:16:04 kernel: amdgpu 0000:0b:00.0: amdgpu: PERMISSION_FAULTS: 0x0
Feb 01 12:16:04 kernel: amdgpu 0000:0b:00.0: amdgpu: MAPPING_ERROR: 0x0
Feb 01 12:16:04 kernel: amdgpu 0000:0b:00.0: amdgpu: RW: 0x0
Feb 01 12:16:04 kernel: amdgpu 0000:0b:00.0: amdgpu: [mmhub] page fault (src_id:0 ring:40 vmid:1 pasid:32799, for process i386-linux-gnu- pid 5934 thread i386-linux:cs0 pid 5939)
Feb 01 12:16:04 kernel: amdgpu 0000:0b:00.0: amdgpu: in page starting at address 0x0000800100768000 from client 0x12 (VMC)
Feb 01 12:16:04 kernel: amdgpu 0000:0b:00.0: amdgpu: MMVM_L2_PROTECTION_FAULT_STATUS:0x00000000
Feb 01 12:16:04 kernel: amdgpu 0000:0b:00.0: amdgpu: Faulty UTCL2 client ID: unknown (0x0)
Feb 01 12:16:04 kernel: amdgpu 0000:0b:00.0: amdgpu: MORE_FAULTS: 0x0
Feb 01 12:16:04 kernel: amdgpu 0000:0b:00.0: amdgpu: WALKER_ERROR: 0x0
Feb 01 12:16:04 kernel: amdgpu 0000:0b:00.0: amdgpu: PERMISSION_FAULTS: 0x0
Feb 01 12:16:04 kernel: amdgpu 0000:0b:00.0: amdgpu: MAPPING_ERROR: 0x0
Feb 01 12:16:04 kernel: amdgpu 0000:0b:00.0: amdgpu: RW: 0x0
Feb 01 12:16:04 kernel: amdgpu 0000:0b:00.0: amdgpu: [mmhub] page fault (src_id:0 ring:40 vmid:1 pasid:32799, for process i386-linux-gnu- pid 5934 thread i386-linux:cs0 pid 5939)
Feb 01 12:16:04 kernel: amdgpu 0000:0b:00.0: amdgpu: in page starting at address 0x0000800100769000 from client 0x12 (VMC)
Feb 01 12:16:04 kernel: amdgpu 0000:0b:00.0: amdgpu: MMVM_L2_PROTECTION_FAULT_STATUS:0x00000000
Feb 01 12:16:04 kernel: amdgpu 0000:0b:00.0: amdgpu: Faulty UTCL2 client ID: unknown (0x0)
Feb 01 12:16:04 kernel: amdgpu 0000:0b:00.0: amdgpu: MORE_FAULTS: 0x0
Feb 01 12:16:04 kernel: amdgpu 0000:0b:00.0: amdgpu: WALKER_ERROR: 0x0
Feb 01 12:16:04 kernel: amdgpu 0000:0b:00.0: amdgpu: PERMISSION_FAULTS: 0x0
Feb 01 12:16:04 kernel: amdgpu 0000:0b:00.0: amdgpu: MAPPING_ERROR: 0x0
Feb 01 12:16:04 kernel: amdgpu 0000:0b:00.0: amdgpu: RW: 0x0
Feb 01 12:16:04 kernel: amdgpu 0000:0b:00.0: amdgpu: [mmhub] page fault (src_id:0 ring:40 vmid:1 pasid:32799, for process i386-linux-gnu- pid 5934 thread i386-linux:cs0 pid 5939)
Feb 01 12:16:04 kernel: amdgpu 0000:0b:00.0: amdgpu: in page starting at address 0x000080010076a000 from client 0x12 (VMC)
Feb 01 12:16:04 kernel: amdgpu 0000:0b:00.0: amdgpu: MMVM_L2_PROTECTION_FAULT_STATUS:0x00000000
Feb 01 12:16:04 kernel: amdgpu 0000:0b:00.0: amdgpu: Faulty UTCL2 client ID: unknown (0x0)
Feb 01 12:16:04 kernel: amdgpu 0000:0b:00.0: amdgpu: MORE_FAULTS: 0x0
Feb 01 12:16:04 kernel: amdgpu 0000:0b:00.0: amdgpu: WALKER_ERROR: 0x0
Feb 01 12:16:04 kernel: amdgpu 0000:0b:00.0: amdgpu: PERMISSION_FAULTS: 0x0
Feb 01 12:16:04 kernel: amdgpu 0000:0b:00.0: amdgpu: MAPPING_ERROR: 0x0
Feb 01 12:16:04 kernel: amdgpu 0000:0b:00.0: amdgpu: RW: 0x0
Feb 01 12:16:04 kernel: amdgpu 0000:0b:00.0: amdgpu: [mmhub] page fault (src_id:0 ring:40 vmid:1 pasid:32799, for process i386-linux-gnu- pid 5934 thread i386-linux:cs0 pid 5939)
Feb 01 12:16:04 kernel: amdgpu 0000:0b:00.0: amdgpu: in page starting at address 0x0000800100768000 from client 0x12 (VMC)
Feb 01 12:16:04 kernel: amdgpu 0000:0b:00.0: amdgpu: MMVM_L2_PROTECTION_FAULT_STATUS:0x00000000
Feb 01 12:16:04 kernel: amdgpu 0000:0b:00.0: amdgpu: Faulty UTCL2 client ID: unknown (0x0)
Feb 01 12:16:04 kernel: amdgpu 0000:0b:00.0: amdgpu: MORE_FAULTS: 0x0
Feb 01 12:16:04 kernel: amdgpu 0000:0b:00.0: amdgpu: WALKER_ERROR: 0x0
Feb 01 12:16:04 kernel: amdgpu 0000:0b:00.0: amdgpu: PERMISSION_FAULTS: 0x0
Feb 01 12:16:04 kernel: amdgpu 0000:0b:00.0: amdgpu: MAPPING_ERROR: 0x0
Feb 01 12:16:04 kernel: amdgpu 0000:0b:00.0: amdgpu: RW: 0x0
Feb 01 12:16:04 kernel: amdgpu 0000:0b:00.0: amdgpu: [mmhub] page fault (src_id:0 ring:40 vmid:1 pasid:32799, for process i386-linux-gnu- pid 5934 thread i386-linux:cs0 pid 5939)
Feb 01 12:16:04 kernel: amdgpu 0000:0b:00.0: amdgpu: in page starting at address 0x0000800100769000 from client 0x12 (VMC)
Feb 01 12:16:04 kernel: amdgpu 0000:0b:00.0: amdgpu: MMVM_L2_PROTECTION_FAULT_STATUS:0x00000000
Feb 01 12:16:04 kernel: amdgpu 0000:0b:00.0: amdgpu: Faulty UTCL2 client ID: unknown (0x0)
Feb 01 12:16:04 kernel: amdgpu 0000:0b:00.0: amdgpu: MORE_FAULTS: 0x0
Feb 01 12:16:04 kernel: amdgpu 0000:0b:00.0: amdgpu: WALKER_ERROR: 0x0
Feb 01 12:16:04 kernel: amdgpu 0000:0b:00.0: amdgpu: PERMISSION_FAULTS: 0x0
Feb 01 12:16:04 kernel: amdgpu 0000:0b:00.0: amdgpu: MAPPING_ERROR: 0x0
Feb 01 12:16:04 kernel: amdgpu 0000:0b:00.0: amdgpu: RW: 0x0
Feb 01 12:16:04 kernel: amdgpu 0000:0b:00.0: amdgpu: [mmhub] page fault (src_id:0 ring:40 vmid:1 pasid:32799, for process i386-linux-gnu- pid 5934 thread i386-linux:cs0 pid 5939)
Feb 01 12:16:04 kernel: amdgpu 0000:0b:00.0: amdgpu: in page starting at address 0x000080010076a000 from client 0x12 (VMC)
Feb 01 12:16:04 kernel: amdgpu 0000:0b:00.0: amdgpu: MMVM_L2_PROTECTION_FAULT_STATUS:0x00000000
Feb 01 12:16:04 kernel: amdgpu 0000:0b:00.0: amdgpu: Faulty UTCL2 client ID: unknown (0x0)
Feb 01 12:16:04 kernel: amdgpu 0000:0b:00.0: amdgpu: MORE_FAULTS: 0x0
Feb 01 12:16:04 kernel: amdgpu 0000:0b:00.0: amdgpu: WALKER_ERROR: 0x0
Feb 01 12:16:04 kernel: amdgpu 0000:0b:00.0: amdgpu: PERMISSION_FAULTS: 0x0
Feb 01 12:16:04 kernel: amdgpu 0000:0b:00.0: amdgpu: MAPPING_ERROR: 0x0
Feb 01 12:16:04 kernel: amdgpu 0000:0b:00.0: amdgpu: RW: 0x0
Feb 01 12:16:04 kernel: amdgpu 0000:0b:00.0: amdgpu: [mmhub] page fault (src_id:0 ring:40 vmid:1 pasid:32799, for process i386-linux-gnu- pid 5934 thread i386-linux:cs0 pid 5939)
Feb 01 12:16:04 kernel: amdgpu 0000:0b:00.0: amdgpu: in page starting at address 0x0000800100768000 from client 0x12 (VMC)
Feb 01 12:16:04 kernel: amdgpu 0000:0b:00.0: amdgpu: MMVM_L2_PROTECTION_FAULT_STATUS:0x00000000
Feb 01 12:16:04 kernel: amdgpu 0000:0b:00.0: amdgpu: Faulty UTCL2 client ID: unknown (0x0)
Feb 01 12:16:04 kernel: amdgpu 0000:0b:00.0: amdgpu: MORE_FAULTS: 0x0
Feb 01 12:16:04 kernel: amdgpu 0000:0b:00.0: amdgpu: WALKER_ERROR: 0x0
Feb 01 12:16:04 kernel: amdgpu 0000:0b:00.0: amdgpu: PERMISSION_FAULTS: 0x0
Feb 01 12:16:04 kernel: amdgpu 0000:0b:00.0: amdgpu: MAPPING_ERROR: 0x0
Feb 01 12:16:04 kernel: amdgpu 0000:0b:00.0: amdgpu: RW: 0x0
Feb 01 12:16:04 kernel: amdgpu 0000:0b:00.0: amdgpu: [mmhub] page fault (src_id:0 ring:40 vmid:1 pasid:32799, for process i386-linux-gnu- pid 5934 thread i386-linux:cs0 pid 5939)
Feb 01 12:16:04 kernel: amdgpu 0000:0b:00.0: amdgpu: in page starting at address 0x0000800100769000 from client 0x12 (VMC)
Feb 01 12:16:04 kernel: amdgpu 0000:0b:00.0: amdgpu: MMVM_L2_PROTECTION_FAULT_STATUS:0x00000000
Feb 01 12:16:04 kernel: amdgpu 0000:0b:00.0: amdgpu: Faulty UTCL2 client ID: unknown (0x0)
Feb 01 12:16:04 kernel: amdgpu 0000:0b:00.0: amdgpu: MORE_FAULTS: 0x0
Feb 01 12:16:04 kernel: amdgpu 0000:0b:00.0: amdgpu: WALKER_ERROR: 0x0
Feb 01 12:16:04 kernel: amdgpu 0000:0b:00.0: amdgpu: PERMISSION_FAULTS: 0x0
Feb 01 12:16:04 kernel: amdgpu 0000:0b:00.0: amdgpu: MAPPING_ERROR: 0x0
Feb 01 12:16:04 kernel: amdgpu 0000:0b:00.0: amdgpu: RW: 0x0
Feb 01 12:16:13 kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring vcn_dec_0 timeout, signaled seq=5, emitted seq=8
Feb 01 12:16:13 kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process pid 0 thread pid 0
Feb 01 12:16:13 kernel: amdgpu 0000:0b:00.0: amdgpu: GPU reset begin!
Feb 01 12:16:14 kernel: amdgpu 0000:0b:00.0: amdgpu: free PSP TMR buffer
Feb 01 12:16:14 kernel: amdgpu 0000:0b:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x001c address=0x467300 flags=0x0000]
Feb 01 12:16:14 kernel: amdgpu 0000:0b:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x001c address=0x467340 flags=0x0000]
Feb 01 12:16:14 kernel: amdgpu 0000:0b:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x001c address=0x467380 flags=0x0000]
Feb 01 12:16:14 kernel: amdgpu 0000:0b:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x001c address=0x4673c0 flags=0x0000]
Feb 01 12:16:14 kernel: amdgpu 0000:0b:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x001c address=0x467400 flags=0x0000]
Feb 01 12:16:14 kernel: amdgpu 0000:0b:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x001c address=0x467440 flags=0x0000]
Feb 01 12:16:14 kernel: amdgpu 0000:0b:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x001c address=0x467480 flags=0x0000]
Feb 01 12:16:14 kernel: amdgpu 0000:0b:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x001c address=0x4674c0 flags=0x0000]
Feb 01 12:16:14 kernel: amdgpu 0000:0b:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x001c address=0x467500 flags=0x0000]
Feb 01 12:16:14 kernel: amdgpu 0000:0b:00.0: amdgpu: MODE1 reset
Feb 01 12:16:14 kernel: amdgpu 0000:0b:00.0: amdgpu: GPU mode1 reset
Feb 01 12:16:14 kernel: amdgpu 0000:0b:00.0: amdgpu: GPU smu mode1 reset
Feb 01 12:16:26 kernel: amdgpu 0000:0b:00.0: amdgpu: GPU reset succeeded, trying to resume
Feb 01 12:16:33 kernel: [drm:psp_v11_0_memory_training [amdgpu]] *ERROR* send training msg failed.
Feb 01 12:16:33 kernel: [drm:psp_resume [amdgpu]] *ERROR* Failed to process memory training!
Feb 01 12:16:33 kernel: [drm:amdgpu_device_fw_loading [amdgpu]] *ERROR* resume of IP block <psp> failed -62
Feb 01 12:16:33 kernel: amdgpu 0000:0b:00.0: amdgpu: GPU reset(1) failed
Feb 01 12:16:33 kernel: amdgpu 0000:0b:00.0: amdgpu: GPU reset end with ret = -62
Feb 01 12:16:33 kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* GPU Recovery Failed: -62
Feb 01 12:16:43 kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma1 timeout, signaled seq=1609, emitted seq=1611
Feb 01 12:16:43 kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process pid 0 thread pid 0
Feb 01 12:16:43 kernel: amdgpu 0000:0b:00.0: amdgpu: GPU reset begin!
Feb 01 12:16:43 kernel: amdgpu 0000:0b:00.0: amdgpu: Failed to disallow df cstate
Feb 01 12:19:33 kernel: Workqueue: amdgpu-reset-dev drm_sched_job_timedout [gpu_sched]
Feb 01 12:19:33 kernel: dm_suspend+0xbe/0x1c0 [amdgpu 7eaeb6c5ff6b212721f0b800337ffc8ad6a0deac]
Feb 01 12:19:33 kernel: amdgpu_device_ip_suspend_phase1+0x73/0xd0 [amdgpu 7eaeb6c5ff6b212721f0b800337ffc8ad6a0deac]
Feb 01 12:19:33 kernel: amdgpu_device_ip_suspend+0x1f/0x70 [amdgpu 7eaeb6c5ff6b212721f0b800337ffc8ad6a0deac]
Feb 01 12:19:33 kernel: amdgpu_device_pre_asic_reset+0xd3/0x290 [amdgpu 7eaeb6c5ff6b212721f0b800337ffc8ad6a0deac]
Feb 01 12:19:33 kernel: amdgpu_device_gpu_recover.cold+0x607/0xad4 [amdgpu 7eaeb6c5ff6b212721f0b800337ffc8ad6a0deac]
Feb 01 12:19:33 kernel: amdgpu_job_timedout+0x1dc/0x220 [amdgpu 7eaeb6c5ff6b212721f0b800337ffc8ad6a0deac]
Feb 01 12:19:33 kernel: handle_cursor_update+0x1cd/0x360 [amdgpu 7eaeb6c5ff6b212721f0b800337ffc8ad6a0deac]
Feb 01 12:19:33 kernel: amdgpu_drm_ioctl+0x4e/0x90 [amdgpu 7eaeb6c5ff6b212721f0b800337ffc8ad6a0deac]
Feb 01 12:19:33 kernel: amdgpu_vm_fini+0xfb/0x510 [amdgpu 7eaeb6c5ff6b212721f0b800337ffc8ad6a0deac]
Feb 01 12:19:33 kernel: amdgpu_driver_postclose_kms+0x1e9/0x2d0 [amdgpu 7eaeb6c5ff6b212721f0b800337ffc8ad6a0deac]
Feb 01 12:21:36 kernel: Workqueue: amdgpu-reset-dev drm_sched_job_timedout [gpu_sched]
Feb 01 12:21:36 kernel: dm_suspend+0xbe/0x1c0 [amdgpu 7eaeb6c5ff6b212721f0b800337ffc8ad6a0deac]
Feb 01 12:21:36 kernel: amdgpu_device_ip_suspend_phase1+0x73/0xd0 [amdgpu 7eaeb6c5ff6b212721f0b800337ffc8ad6a0deac]
Feb 01 12:21:36 kernel: amdgpu_device_ip_suspend+0x1f/0x70 [amdgpu 7eaeb6c5ff6b212721f0b800337ffc8ad6a0deac]
Feb 01 12:21:36 kernel: amdgpu_device_pre_asic_reset+0xd3/0x290 [amdgpu 7eaeb6c5ff6b212721f0b800337ffc8ad6a0deac]
Feb 01 12:21:36 kernel: amdgpu_device_gpu_recover.cold+0x607/0xad4 [amdgpu 7eaeb6c5ff6b212721f0b800337ffc8ad6a0deac]
Feb 01 12:21:36 kernel: amdgpu_job_timedout+0x1dc/0x220 [amdgpu 7eaeb6c5ff6b212721f0b800337ffc8ad6a0deac]
Feb 01 12:21:36 kernel: handle_cursor_update+0x1cd/0x360 [amdgpu 7eaeb6c5ff6b212721f0b800337ffc8ad6a0deac]
Feb 01 12:21:36 kernel: amdgpu_drm_ioctl+0x4e/0x90 [amdgpu 7eaeb6c5ff6b212721f0b800337ffc8ad6a0deac]
Feb 01 12:21:36 kernel: amdgpu_vm_fini+0xfb/0x510 [amdgpu 7eaeb6c5ff6b212721f0b800337ffc8ad6a0deac]
Feb 01 12:21:36 kernel: amdgpu_driver_postclose_kms+0x1e9/0x2d0 [amdgpu 7eaeb6c5ff6b212721f0b800337ffc8ad6a0deac]