[Bisected]Booting with kernel version 5.1.0 or higher on RX 580 hangs
Submitted by Gobinda Joy
Assigned to Default DRI bug account
Link to original bug (#110822)
Description
Created attachment 144420
Linux version 5.1.6-350.vanilla.knurd.1.fc30.x86_64
My hardware is as follows:
CPU: i7 3770 at stock clock
Motherboard: Gigabyte G1.Sniper 3 latest BIOS available
RAM: 24 GB DDR3 at 1600 mhz
GPU: RX 580 8GB (Sapphire) latest VBIOS
The problem is with kernel 5.1.0 or higher (currently 5.1.6) Display hangs when amdgpu driver loads. I'm unable to determine if the booting is continued or hangs as well. Disk activity stops after couple seconds and not possible to switch TTY.
Ctrl+Alt+Del is unresponsive as well.
This problem goes away when amdgpu.dpm=0 is used but in that case dynamic power scaling is not available and gpu stuck at low clock, graphics performance is abysmal. Also GPU temp/fan speed utilities doesn't work.
Here is the excerpt of the problematic log lines:
Jun 02 09:54:05 kernel: amdgpu: [powerplay]
last message was failed ret is 65535
Jun 02 09:54:06 kernel: amdgpu: [powerplay]
failed to send message 15b ret is 65535
Jun 02 09:54:06 kernel: hrtimer: interrupt took 287743313 ns
Jun 02 09:54:06 kernel: clocksource: timekeeping watchdog on CPU3: Marking clocksource 'tsc' as unstable because the skew is too large:
Jun 02 09:54:06 kernel: clocksource: 'hpet' wd_now: 628dd7b wd_last: 5fef431 mask: ffffffff
Jun 02 09:54:06 kernel: clocksource: 'tsc' cs_now: 254aa24747 cs_last: 25104a5bfd mask: ffffffffffffffff
Jun 02 09:54:06 kernel: tsc: Marking TSC unstable due to clocksource watchdog
Jun 02 09:54:07 kernel: amdgpu: [powerplay]
last message was failed ret is 65535
Jun 02 09:54:07 kernel: amdgpu: [powerplay]
failed to send message 148 ret is 65535
Jun 02 09:54:07 kernel: amdgpu: [powerplay]
last message was failed ret is 65535
Jun 02 09:54:07 kernel: amdgpu: [powerplay]
failed to send message 145 ret is 65535
Jun 02 09:54:08 kernel: amdgpu: [powerplay]
last message was failed ret is 65535
Jun 02 09:54:08 kernel: TSC found unstable after boot, most likely due to broken BIOS. Use 'tsc=unstable'.
Jun 02 09:54:08 kernel: sched_clock: Marking unstable (8791691311, 362291)<-(8817904668, -25851212)
Jun 02 09:54:08 kernel: amdgpu: [powerplay]
failed to send message 146 ret is 65535
Jun 02 09:54:08 kernel: hid-generic 0003:09DA:FC7C.0003: input,hidraw2: USB HID v1.11 Mouse [COMPANY USB Device] on usb-0000:00:1a.0-1.5.3/input0
Jun 02 09:54:09 kernel: hid-generic 0003:09DA:FC7C.0004: hiddev97,hidraw3: USB HID v1.11 Device [COMPANY USB Device] on usb-0000:00:1a.0-1.5.3/input1
Jun 02 09:54:11 kernel: clocksource: Switched to clocksource hpet
Jun 02 09:54:13 kernel: amdgpu: [powerplay]
last message was failed ret is 65535
Jun 02 09:54:13 kernel: amdgpu: [powerplay]
failed to send message 260 ret is 65535
Jun 02 09:54:14 kernel: amdgpu: [powerplay]
last message was failed ret is 65535
Jun 02 09:54:14 kernel: amdgpu: [powerplay]
failed to send message 260 ret is 65535
Jun 02 09:54:14 kernel: amdgpu: [powerplay]
last message was failed ret is 65535
Jun 02 09:54:14 kernel: amdgpu: [powerplay]
failed to send message 260 ret is 65535
Jun 02 09:54:14 kernel: amdgpu: [powerplay]
last message was failed ret is 65535
Jun 02 09:54:14 kernel: amdgpu: [powerplay]
failed to send message 260 ret is 65535
Jun 02 09:54:14 kernel: amdgpu: [powerplay]
last message was failed ret is 65535
Jun 02 09:54:14 kernel: amdgpu: [powerplay]
failed to send message 260 ret is 65535
Jun 02 09:54:14 kernel: amdgpu: [powerplay]
last message was failed ret is 65535
Jun 02 09:54:14 kernel: amdgpu: [powerplay]
failed to send message 260 ret is 65535
Jun 02 09:54:14 kernel: amdgpu: [powerplay]
last message was failed ret is 65535
Jun 02 09:54:14 kernel: amdgpu: [powerplay]
failed to send message 260 ret is 65535
Jun 02 09:54:14 kernel: amdgpu: [powerplay]
last message was failed ret is 65535
Jun 02 09:54:14 kernel: amdgpu: [powerplay]
failed to send message 260 ret is 65535
Jun 02 09:54:14 kernel: amdgpu: [powerplay]
last message was failed ret is 65535
Jun 02 09:54:15 kernel: amdgpu: [powerplay]
failed to send message 260 ret is 65535
Jun 02 09:54:15 kernel: amdgpu: [powerplay]
last message was failed ret is 65535
Jun 02 09:54:15 kernel: amdgpu: [powerplay]
failed to send message 260 ret is 65535
Jun 02 09:54:15 kernel: amdgpu: [powerplay]
last message was failed ret is 65535
Jun 02 09:54:16 kernel: amdgpu: [powerplay]
failed to send message 260 ret is 65535
Jun 02 09:54:16 kernel: amdgpu: [powerplay]
last message was failed ret is 65535
Jun 02 09:54:16 kernel: amdgpu: [powerplay]
failed to send message 260 ret is 65535
Jun 02 09:54:16 kernel: amdgpu: [powerplay]
last message was failed ret is 65535
Jun 02 09:54:16 kernel: amdgpu: [powerplay]
failed to send message 260 ret is 65535
Jun 02 09:54:16 kernel: amdgpu: [powerplay]
last message was failed ret is 65535
Jun 02 09:54:16 kernel: amdgpu: [powerplay]
failed to send message 260 ret is 65535
Jun 02 09:54:16 kernel: amdgpu: [powerplay]
last message was failed ret is 65535
Jun 02 09:54:16 kernel: amdgpu: [powerplay]
failed to send message 260 ret is 65535
Jun 02 09:54:16 kernel: amdgpu: [powerplay]
last message was failed ret is 65535
Jun 02 09:54:16 kernel: amdgpu: [powerplay]
failed to send message 260 ret is 65535
Jun 02 09:54:16 kernel: amdgpu: [powerplay]
last message was failed ret is 65535
Jun 02 09:54:16 kernel: amdgpu: [powerplay]
failed to send message 260 ret is 65535
Jun 02 09:54:17 kernel: [drm] Initialized amdgpu 3.30.0 20150101 for 0000:04:00.0 on minor 0
Jun 02 09:54:17 kernel: EXT4-fs (sda3): mounted filesystem with ordered data mode. Opts: (null)
Jun 02 09:54:20 kernel: amdgpu 0000:04:00.0: [drm:amdgpu_ib_ring_tests [amdgpu]] ERROR IB test failed on gfx (-110).
Jun 02 09:54:21 kernel: [drm:amdgpu_device_ip_late_init_func_handler [amdgpu]] ERROR ib ring test failed (-110).
Any help is appreciated. Also let me know if I can help in any way.
Attachment 144420, "Linux version 5.1.6-350.vanilla.knurd.1.fc30.x86_64":
dmesg.txt