amdgpu crash on external usb-c monitor
Brief summary of the problem
I've got a strange problem that a few times per day my external monitor connected trough usb-c is crashing. The laptop monitor and the external monitor trough hdmi stay intact, but the usb-c monitor starts flickering and becomes unusable. I still see and can move the mouse on it, but otherwise, the image stands still with the last working output, with flickering rectangles all over it. Only logout or reboot solve the problem, but it gets annoying closing all work 6-10 times per day. Sometimes when I leave it in that crashed state to finish work, the system eventually freezes completely and only a hard reset helps. I have no indicator when it happens - it seems random to me, but it seems only to happen when I work - when I leave the machine on for four hours it won't crash. I'm not gaming, I'm working pretty standard developer stuff with almost no GPU load at all.
As of this, I tested amdgpu.bapm=0 as a grub kernel parameter https://www.mikejonesey.co.uk/linux/amdgpu-linux-driver-parameters but it still happened Also tested amdgpu.dpm=0, but it doesn't even start the kernel with this param on Also tested everything with kernel 5.13.0-19 with the same problem still happening
I tried updating BIOS on the GPU and/or UEFI on the machine itself. everything is on the newest version, also the problem doesn't seem to appear in windows 10 and 11. With older kernels, the usb-c monitor doesn't work at all.
Some people with similar problems appeared in 2021, but all of those problems seem to have been solved with newer kernels. None of the solutions seem to help here. I've read about downgrading mesa to 21.1.x, but am not sure how to do that.
Hardware description
- System:
- Host: morty Kernel: 5.13.0-22-generic x86_64 bits: 64 Desktop: GNOME 40.5
- Distro: Ubuntu 21.10 (Impish Indri)
- Machine:
- Type: Laptop System: ASUSTeK product: ROG Strix G713QY_G713QY v: 1.0
- serial:
- Mobo: ASUSTeK model: G713QY v: 1.0 serial:
- UEFI: American Megatrends LLC. v: G713QY.316 date: 11/29/2021
- Battery:
- ID-1: BAT0 charge: 89.8 Wh (100.0%) condition: 89.8/90.0 Wh (99.7%)
- CPU:
- Info: 8-Core model: AMD Ryzen 9 5900HX with Radeon Graphics bits: 64
- type: MT MCP cache: L2: 4 MiB
- Speed: 2795 MHz min/max: 1200/3300 MHz Core speeds (MHz): 1: 2795 2: 1373
- 3: 1200 4: 1996 5: 1741 6: 1197 7: 2091 8: 1677 9: 2001 10: 1787 11: 2137
- 12: 2208 13: 1655 14: 2189 15: 1910 16: 1916
- Graphics:
- Device-1: AMD Navi 22 [Radeon RX 6700/6700 XT / 6800M] driver: amdgpu
- v: kernel
- Device-2: AMD Cezanne driver: amdgpu v: kernel
- Device-3: Dell Dell Webcam WB7022 type: USB driver: uvcvideo
- Display: x11 server: X.Org 1.20.13 driver: loaded: amdgpu,ati
- unloaded: fbdev,modesetting,radeon,vesa resolution: 1: 1920x1080~165Hz
- 2: 1920x1080
60Hz 3: 1920x108060Hz - OpenGL: renderer: AMD RENOIR (DRM 3.41.0 5.13.0-22-generic LLVM 13.0.0)
- v: 4.6 Mesa 21.3.3 - kisak-mesa PPA
- Audio:
- Device-1: AMD Navi 21 HDMI Audio [Radeon RX 6800/6800 XT / 6900 XT]
- driver: snd_hda_intel
- Device-2: AMD Renoir Radeon High Definition Audio driver: snd_hda_intel
- Device-3: AMD Raven/Raven2/FireFlight/Renoir Audio Processor driver: N/A
- Device-4: AMD Family 17h HD Audio driver: snd_hda_intel
- Device-5: GN Netcom Jabra Evolve 75 type: USB
- driver: jabra,snd-usb-audio,usbhid
- Device-6: Trust USB microphone type: USB
- driver: hid-generic,snd-usb-audio,usbhid
- Sound Server-1: ALSA v: k5.13.0-22-generic running: yes
- Sound Server-2: PulseAudio v: 15.0 running: yes
- Sound Server-3: PipeWire v: 0.3.32 running: yes
- Network:
- Device-1: Realtek RTL8111/8168/8411 PCI Express Gigabit Ethernet
- driver: r8169
- IF: enp4s0 state: up speed: 100 Mbps duplex: full mac: 7c:10:c9:27:2f:34
- Device-2: MEDIATEK driver: mt7921e
- IF: wlp5s0 state: up mac: 48:e7:da:42:3e:f3
- Bluetooth:
- Device-1: IMC Networks Wireless_Device type: USB driver: btusb
- Report: hciconfig ID: hci0 rfk-id: 0 state: down
- bt-service: enabled,running rfk-block: hardware: no software: no
- address: 00:00:00:00:00:00
- Drives:
- Local Storage: total: 953.87 GiB used: 28.89 GiB (3.0%)
- ID-1: /dev/nvme0n1 vendor: Samsung model: MZVLQ1T0HBLB-00B00
- size: 953.87 GiB
- Partition:
- ID-1: / size: 274.45 GiB used: 28.86 GiB (10.5%) fs: ext4
- dev: /dev/nvme0n1p8
- ID-2: /boot/efi size: 256 MiB used: 33 MiB (12.9%) fs: vfat
- dev: /dev/nvme0n1p1
- Swap:
- ID-1: swap-1 type: file size: 2 GiB used: 0 KiB (0.0%) file: /swapfile
- Sensors:
- System Temperatures: cpu: 55.0 C mobo: N/A
- Fan Speeds (RPM): cpu: 2100
- GPU: device: amdgpu temp: 42.0 C fan: 0 device: amdgpu temp: N/A
- Info:
- Processes: 415 Uptime: 16m Memory: 15.04 GiB used: 3.7 GiB (24.6%)
- Shell: Bash inxi: 3.3.06
How to reproduce the issue:
As mentioned, happens randomly but ONLY on the usb-c monitor. Doesn't happen on windows 10/11, BIOS is up to date - it might be related to the specific hardware, I just hope the logs are helping in any way.
Log files and Video (for system lockups / game freezes / crashes)
- Various logs as recommended by amdgpu bugtracker log20210401.zip
- Video of how that looks: https://drive.google.com/file/d/1N9QND4i_a_4Vy0auV-eGDn-GCfbXrLM9/view?usp=sharing