amdgpu lockup and GPU reset "retry page fault" on Ubuntu 23.10, ThinkPad T14 Gen 1
System information
System:
Host: 1-1-1-2-35a Kernel: 6.5.0-13-generic arch: x86_64 bits: 64
compiler: N/A Desktop: GNOME v: 45.1 tk: GTK v: 3.24.38 wm: gnome-shell
dm: GDM3 Distro: Ubuntu 23.10 (Mantic Minotaur)
CPU:
Info: 8-core model: AMD Ryzen 7 PRO 4750U with Radeon Graphics bits: 64
type: MT MCP arch: Zen 2 rev: 1 cache: L1: 512 KiB L2: 4 MiB L3: 8 MiB
Speed (MHz): avg: 1840 high: 3886 min/max: 1400/1700 boost: enabled cores:
1: 3872 2: 3886 3: 1734 4: 2042 5: 1400 6: 1728 7: 1681 8: 1400 9: 1462
10: 1400 11: 1397 12: 1400 13: 1453 14: 1456 15: 1400 16: 1729
bogomips: 54298
Flags: avx avx2 ht lm nx pae sse sse2 sse3 sse4_1 sse4_2 sse4a ssse3 svm
Graphics:
Device-1: AMD Renoir vendor: Lenovo driver: amdgpu v: kernel arch: GCN-5
pcie: speed: 8 GT/s lanes: 16 ports: active: HDMI-A-1 off: eDP-1
empty: DP-1,DP-2 bus-ID: 07:00.0 chip-ID: 1002:1636 temp: 55.0 C
Device-2: IMC Networks Integrated Camera driver: uvcvideo type: USB
rev: 2.0 speed: 480 Mb/s lanes: 1 bus-ID: 2-2:2 chip-ID: 13d3:5406
Device-3: Chicony USB2.0 FHD UVC WebCam driver: uvcvideo type: USB
rev: 2.0 speed: 480 Mb/s lanes: 1 bus-ID: 6-2.4:8 chip-ID: 04f2:b612
Display: wayland server: X.org v: 1.21.1.7 with: Xwayland v: 23.2.0
compositor: gnome-shell driver: X: loaded: amdgpu
unloaded: fbdev,modesetting,radeon,vesa dri: radeonsi gpu: amdgpu
display-ID: 0
Monitor-1: HDMI-A-1 model: Philips PHL 346P1C res: 3440x1440 dpi: 110
diag: 864mm (34")
Monitor-2: eDP-1 model: AU Optronics 0x573d res: 1920x1080 dpi: 158
diag: 355mm (14")
API: OpenGL v: 4.6 Mesa 23.2.1-1ubuntu3 renderer: AMD Radeon Graphics
(renoir LLVM 15.0.7 DRM 3.54 6.5.0-13-generic) direct-render: Yes
- OS: Ubuntu 23.10
- GPU: 07:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Renoir [1002:1636] (rev d1)
*-display
description: VGA compatible controller
product: Renoir [1002:1636]
vendor: Advanced Micro Devices, Inc. [AMD/ATI] [1002]
physical id: 0
bus info: pci@0000:07:00.0
logical name: /dev/fb0
version: d1
width: 64 bits
clock: 33MHz
capabilities: pm pciexpress msi msix vga_controller bus_master cap_list fb
configuration: depth=32 driver=amdgpu latency=0 mode=1920x1080 resolution=3440,1440 visual=truecolor xres=1920 yres=1080
resources: iomemory:80-7f iomemory:80-7f irq:62 memory:860000000-86fffffff memory:870000000-8701fffff ioport:1000(size=256) memory:fd300000-fd37ffff
-
Kernel version: Linux 1-1-1-2-35a 6.5.0-13-generic #13 (closed)-Ubuntu SMP PREEMPT_DYNAMIC Fri Nov 3 12:16:05 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
-
Mesa version: OpenGL version string: 4.6 (Compatibility Profile) Mesa 23.2.1-1ubuntu3
-
Xserver version (if applicable): using wayland (default), but here is the output:
X.Org X Server 1.21.1.7 X Protocol Version 11, Revision 0 Current Operating System: Linux 1-1-1-2-35a 6.5.0-13-generic #13-Ubuntu SMP PREEMPT_DYNAMIC Fri Nov 3 12:16:05 UTC 2023 x86_64 Kernel command line: BOOT_IMAGE=/vmlinuz-6.5.0-13-generic root=/dev/mapper/vgroot-lvroot ro quiet splash amdgpu.noretry=0 vt.handoff=7 xorg-server 2:21.1.7-3ubuntu2.1 (For technical support please see http://www.ubuntu.com/support) Current version of pixman: 0.42.2 Before reporting problems, check http://wiki.x.org to make sure that you have the latest version.
Some internet thread suggested to try
amdgpu.noretry=0
but that doesn't change anything
Laptop model:
System Information
Manufacturer: LENOVO
Product Name: 20UD0013GE
Version: ThinkPad T14 Gen 1
Description
Freeze occurs randomly and in different shapes. Most of the time the screen displays random artefacts from the currently visible windows, all the shapes on the screen are "jumping around" when something is moving in the graphics. Sometimes everything restores to normal on its own after 1-2 minutes, sometimes it stays like this or locks up entirely. To get back to a working state quicker, I ususally press the power button to cause suspend. Waking up from suspend graphics is normal again. When there is a complete lockup, I can only reset the PC. Semetimes the screen goes black and doesn't respond. Sometimes the screen just freezes and doesn't respond.
Most of the time this happens when switching between maximized windows via alt + tab.
Regression
There were more or less no such problems on Ubuntu 22.04. Waking up from suspend did not work because of such graphics related problems in 22.04. The screen was frozen after waking up from suspend. I was hoping to get that fixed by updating to 23.10, but the new problem describe in this issue is even worse.
Log files as attachment
dmesg:
[ 3486.183400] amdgpu 0000:07:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:4 pasid:32774, for process Xwayland pid 3073 thread Xwayland:cs0 pid 3170)
[ 3486.183442] amdgpu 0000:07:00.0: amdgpu: in page starting at address 0x00001aa6e3050000 from IH client 0x1b (UTCL2)
[ 3486.183466] amdgpu 0000:07:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00400431
[ 3486.183479] amdgpu 0000:07:00.0: amdgpu: Faulty UTCL2 client ID: IA (0x2)
[ 3486.183492] amdgpu 0000:07:00.0: amdgpu: MORE_FAULTS: 0x1
[ 3486.183507] amdgpu 0000:07:00.0: amdgpu: WALKER_ERROR: 0x0
[ 3486.183524] amdgpu 0000:07:00.0: amdgpu: PERMISSION_FAULTS: 0x3
[ 3486.183538] amdgpu 0000:07:00.0: amdgpu: MAPPING_ERROR: 0x0
[ 3486.183549] amdgpu 0000:07:00.0: amdgpu: RW: 0x0
[ 3486.183606] amdgpu 0000:07:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:4 pasid:32774, for process Xwayland pid 3073 thread Xwayland:cs0 pid 3170)
[ 3486.183643] amdgpu 0000:07:00.0: amdgpu: in page starting at address 0x00001aa6e3050000 from IH client 0x1b (UTCL2)
[ 3486.183666] amdgpu 0000:07:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00400431
[ 3486.183679] amdgpu 0000:07:00.0: amdgpu: Faulty UTCL2 client ID: IA (0x2)
[ 3486.183696] amdgpu 0000:07:00.0: amdgpu: MORE_FAULTS: 0x1
[ 3486.183713] amdgpu 0000:07:00.0: amdgpu: WALKER_ERROR: 0x0
[ 3486.183727] amdgpu 0000:07:00.0: amdgpu: PERMISSION_FAULTS: 0x3
[ 3486.183744] amdgpu 0000:07:00.0: amdgpu: MAPPING_ERROR: 0x0
[ 3486.183755] amdgpu 0000:07:00.0: amdgpu: RW: 0x0
[ 3486.183983] amdgpu 0000:07:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:4 pasid:32774, for process Xwayland pid 3073 thread Xwayland:cs0 pid 3170)
[ 3486.184032] amdgpu 0000:07:00.0: amdgpu: in page starting at address 0x00001aa6e3050000 from IH client 0x1b (UTCL2)
[ 3486.184054] amdgpu 0000:07:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00400431
[ 3486.184069] amdgpu 0000:07:00.0: amdgpu: Faulty UTCL2 client ID: IA (0x2)
[ 3486.184085] amdgpu 0000:07:00.0: amdgpu: MORE_FAULTS: 0x1
[ 3486.184091] amdgpu 0000:07:00.0: amdgpu: WALKER_ERROR: 0x0
[ 3486.184095] amdgpu 0000:07:00.0: amdgpu: PERMISSION_FAULTS: 0x3
[ 3486.184099] amdgpu 0000:07:00.0: amdgpu: MAPPING_ERROR: 0x0
[ 3486.184105] amdgpu 0000:07:00.0: amdgpu: RW: 0x0
[ 3486.184118] amdgpu 0000:07:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:4 pasid:32774, for process Xwayland pid 3073 thread Xwayland:cs0 pid 3170)
[ 3486.184127] amdgpu 0000:07:00.0: amdgpu: in page starting at address 0x00001aa6e3050000 from IH client 0x1b (UTCL2)
[ 3486.184135] amdgpu 0000:07:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00400431
[ 3486.184146] amdgpu 0000:07:00.0: amdgpu: Faulty UTCL2 client ID: IA (0x2)
[ 3486.184150] amdgpu 0000:07:00.0: amdgpu: MORE_FAULTS: 0x1
[ 3486.184154] amdgpu 0000:07:00.0: amdgpu: WALKER_ERROR: 0x0
[ 3486.184158] amdgpu 0000:07:00.0: amdgpu: PERMISSION_FAULTS: 0x3
[ 3486.184162] amdgpu 0000:07:00.0: amdgpu: MAPPING_ERROR: 0x0
[ 3486.184166] amdgpu 0000:07:00.0: amdgpu: RW: 0x0
[ 3486.184176] amdgpu 0000:07:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:4 pasid:32774, for process Xwayland pid 3073 thread Xwayland:cs0 pid 3170)
[ 3486.184185] amdgpu 0000:07:00.0: amdgpu: in page starting at address 0x00001aa6e3050000 from IH client 0x1b (UTCL2)
[ 3486.184199] amdgpu 0000:07:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00400431
[ 3486.184204] amdgpu 0000:07:00.0: amdgpu: Faulty UTCL2 client ID: IA (0x2)
[ 3486.184207] amdgpu 0000:07:00.0: amdgpu: MORE_FAULTS: 0x1
[ 3486.184211] amdgpu 0000:07:00.0: amdgpu: WALKER_ERROR: 0x0
[ 3486.184215] amdgpu 0000:07:00.0: amdgpu: PERMISSION_FAULTS: 0x3
[ 3486.184219] amdgpu 0000:07:00.0: amdgpu: MAPPING_ERROR: 0x0
[ 3486.184223] amdgpu 0000:07:00.0: amdgpu: RW: 0x0
[ 3486.184247] amdgpu 0000:07:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:4 pasid:32774, for process Xwayland pid 3073 thread Xwayland:cs0 pid 3170)
[ 3486.184255] amdgpu 0000:07:00.0: amdgpu: in page starting at address 0x00001aa6e3050000 from IH client 0x1b (UTCL2)
[ 3486.184272] amdgpu 0000:07:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00400431
[ 3486.184275] amdgpu 0000:07:00.0: amdgpu: Faulty UTCL2 client ID: IA (0x2)
[ 3486.184280] amdgpu 0000:07:00.0: amdgpu: MORE_FAULTS: 0x1
[ 3486.184284] amdgpu 0000:07:00.0: amdgpu: WALKER_ERROR: 0x0
[ 3486.184288] amdgpu 0000:07:00.0: amdgpu: PERMISSION_FAULTS: 0x3
[ 3486.184292] amdgpu 0000:07:00.0: amdgpu: MAPPING_ERROR: 0x0
[ 3486.184296] amdgpu 0000:07:00.0: amdgpu: RW: 0x0
[ 3486.184306] amdgpu 0000:07:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:4 pasid:32774, for process Xwayland pid 3073 thread Xwayland:cs0 pid 3170)
[ 3486.184314] amdgpu 0000:07:00.0: amdgpu: in page starting at address 0x00001aa6e3050000 from IH client 0x1b (UTCL2)
[ 3486.184329] amdgpu 0000:07:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00400431
[ 3486.184334] amdgpu 0000:07:00.0: amdgpu: Faulty UTCL2 client ID: IA (0x2)
[ 3486.184337] amdgpu 0000:07:00.0: amdgpu: MORE_FAULTS: 0x1
[ 3486.184342] amdgpu 0000:07:00.0: amdgpu: WALKER_ERROR: 0x0
[ 3486.184345] amdgpu 0000:07:00.0: amdgpu: PERMISSION_FAULTS: 0x3
[ 3486.184348] amdgpu 0000:07:00.0: amdgpu: MAPPING_ERROR: 0x0
[ 3486.184353] amdgpu 0000:07:00.0: amdgpu: RW: 0x0
[ 3486.184362] amdgpu 0000:07:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:4 pasid:32774, for process Xwayland pid 3073 thread Xwayland:cs0 pid 3170)
[ 3486.184370] amdgpu 0000:07:00.0: amdgpu: in page starting at address 0x00001aa6e3050000 from IH client 0x1b (UTCL2)
[ 3486.184384] amdgpu 0000:07:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00400431
[ 3486.184387] amdgpu 0000:07:00.0: amdgpu: Faulty UTCL2 client ID: IA (0x2)
[ 3486.184392] amdgpu 0000:07:00.0: amdgpu: MORE_FAULTS: 0x1
[ 3486.184396] amdgpu 0000:07:00.0: amdgpu: WALKER_ERROR: 0x0
[ 3486.184400] amdgpu 0000:07:00.0: amdgpu: PERMISSION_FAULTS: 0x3
[ 3486.184403] amdgpu 0000:07:00.0: amdgpu: MAPPING_ERROR: 0x0
[ 3486.184407] amdgpu 0000:07:00.0: amdgpu: RW: 0x0
[ 3486.184416] amdgpu 0000:07:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:4 pasid:32774, for process Xwayland pid 3073 thread Xwayland:cs0 pid 3170)
[ 3486.184424] amdgpu 0000:07:00.0: amdgpu: in page starting at address 0x00001aa6e3050000 from IH client 0x1b (UTCL2)
[ 3486.184439] amdgpu 0000:07:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00400431
[ 3486.184443] amdgpu 0000:07:00.0: amdgpu: Faulty UTCL2 client ID: IA (0x2)
[ 3486.184446] amdgpu 0000:07:00.0: amdgpu: MORE_FAULTS: 0x1
[ 3486.184451] amdgpu 0000:07:00.0: amdgpu: WALKER_ERROR: 0x0
[ 3486.184454] amdgpu 0000:07:00.0: amdgpu: PERMISSION_FAULTS: 0x3
[ 3486.184457] amdgpu 0000:07:00.0: amdgpu: MAPPING_ERROR: 0x0
[ 3486.184462] amdgpu 0000:07:00.0: amdgpu: RW: 0x0
[ 3486.184471] amdgpu 0000:07:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:4 pasid:32774, for process Xwayland pid 3073 thread Xwayland:cs0 pid 3170)
[ 3486.184480] amdgpu 0000:07:00.0: amdgpu: in page starting at address 0x00001aa6e3050000 from IH client 0x1b (UTCL2)
[ 3486.184488] amdgpu 0000:07:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00400431
[ 3486.184496] amdgpu 0000:07:00.0: amdgpu: Faulty UTCL2 client ID: IA (0x2)
[ 3486.184501] amdgpu 0000:07:00.0: amdgpu: MORE_FAULTS: 0x1
[ 3486.184504] amdgpu 0000:07:00.0: amdgpu: WALKER_ERROR: 0x0
[ 3486.184508] amdgpu 0000:07:00.0: amdgpu: PERMISSION_FAULTS: 0x3
[ 3486.184512] amdgpu 0000:07:00.0: amdgpu: MAPPING_ERROR: 0x0
[ 3486.184516] amdgpu 0000:07:00.0: amdgpu: RW: 0x0