4.19 Regression - Hawaii (R9 390) boot failure - Invalid PCC GPIO / invalid powerlevel state / Fatal error during GPU init
@jamespharvey20
Submitted by James Harvey Assigned to Default DRI bug account
Link to original bug (#108781)
Description
Created attachment 142499
dmesg (journalctl) of failure on 4.19.2.arch1-1
arch 4.18.16.arch1-1 works, using kernel parameters:
radeon.cik_support=0 amdgpu.cik_support=1 amdgpu.dpm=1 amdgpu.dc=1
Upgraded to 4.19.2.arch1-1, and started getting this failure. Going back to 4.19.arch1-1 still gives this failure.
Full dmesg (journalctl) attached for 4.19.2.arch1-1 (failing), 4.19.arch1-1 (failing), and 4.18.16.arch1-1 (working). But pertinent part of failure is below for search.
This failure occurs booting to a tty, so no X logs are involved. (You might see on 4.18.16.arch1-1, there is a [drm:generic_reg_wait [amdgpu]] error and backtrace which has been happening forever, but it works and doesn't cause a noticeable problem.)
-----
...
03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Hawaii PRO [Radeon R9 290/390] (rev 80) (prog-if 00 [VGA controller])
Subsystem: ASUSTeK Computer Inc. Hawaii PRO [Radeon R9 290/390]
Flags: bus master, fast devsel, latency 0, IRQ 75, NUMA node 0
Memory at c0000000 (64-bit, prefetchable) [size=256M]
Memory at d0000000 (64-bit, prefetchable) [size=8M]
I/O ports at 8000 [size=256]
Memory at dfe00000 (32-bit, non-prefetchable) [size=256K]
Expansion ROM at 000c0000 [disabled] [size=128K]
Capabilities: [48] Vendor Specific Information: Len=08 >
Capabilities: [50] Power Management version 3
Capabilities: [58] Express Legacy Endpoint, MSI 00
Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+
Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010 >
Capabilities: [150] Advanced Error Reporting
Capabilities: [200] Resizable BAR >
Capabilities: [270] Secondary PCI Express >
Capabilities: [2b0] Address Translation Service (ATS)
Capabilities: [2c0] Page Request Interface (PRI)
Capabilities: [2d0] Process Address Space ID (PASID)
Kernel driver in use: amdgpu
Kernel modules: radeon, amdgpu
-----
[drm] Invalid PCC GPIO: 13!
ui class: none
internal class: boot
caps:
uvd vclk: 0 dclk: 0
power level 0 sclk: 30000 mclk: 15000 pcie gen: 3 pcie lanes: 16
status: c r b
ui class: performance
internal class: none
caps:
uvd vclk: 0 dclk: 0
power level 0 sclk: 30000 mclk: 15000 pcie gen: 3 pcie lanes: 16
power level 1 sclk: 105000 mclk: 150000 pcie gen: 3 pcie lanes: 16
status:
[drm] amdgpu: dpm initialized
[drm] Found UVD firmware Version: 1.64 Family ID: 9
[drm] Found VCE firmware Version: 50.10 Binary ID: 2
[drm] PCIE gen 3 link speeds already enabled
[drm:dm_pp_get_static_clocks [amdgpu]] ERROR DM_PPLIB: invalid powerlevel state: 0!
[drm] dce110_link_encoder_construct: Failed to get encoder_cap_info from VBIOS with error code 4!
[drm] dce110_link_encoder_construct: Failed to get encoder_cap_info from VBIOS with error code 4!
[drm] Display Core initialized with v3.1.59!
[drm] DM_MST: Differing MST start on aconnector: 00000000d3bd29d7 [id: 55]
[drm] DM_MST: Differing MST start on aconnector: 000000004b0d56b6 [id: 57]
[drm] DM_MST: Differing MST start on aconnector: 0000000058d5a853 [id: 59]
[drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[drm] Driver supports precise vblank timestamp query.
[drm] UVD initialized successfully.
[drm:amdgpu_vce_ring_test_ring [amdgpu]] ERROR amdgpu: ring 12 test failed
[drm:amdgpu_device_init.cold.14 [amdgpu]] ERROR hw_init of IP block <vce_v2_0>
failed -110
amdgpu 0000:03:00.0: amdgpu_device_ip_init failed
amdgpu 0000:03:00.0: Fatal error during GPU init
[drm] amdgpu: finishing device.
------------[ cut here ]------------
Memory manager not clean during takedown.
WARNING: CPU: 0 PID: 670 at drivers/gpu/drm/drm_mm.c:950 drm_mm_takedown+0x1f/0x30 [drm]
Modules linked in: amdkfd amd_iommu_v2 amdgpu(+) intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel i>
x_tables sr_mod cdrom btrfs xor sd_mod dm_thin_pool dm_persistent_data raid6_pq dm_bio_prison dm_bufio libcrc32c crc32c_gener>
CPU: 0 PID: 670 Comm: kworker/0:4 Not tainted 4.19.0-arch1-1-ARCH #1 (closed)
Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./EP2C602, BIOS P1.90 04/12/2018
Workqueue: events work_for_cpu_fn
RIP: 0010:drm_mm_takedown+0x1f/0x30 [drm]
Code: 0d d0 cb 0f 1f 84 00 00 00 00 00 66 66 66 66 90 48 8b 47 38 48 83 c7 38 48 39 c7 75 01 c3 48 c7 c7 08 b1 1b c1 e8 5b 10 >
RSP: 0018:ffff91764827bd08 EFLAGS: 00010286
RAX: 0000000000000000 RBX: ffff8e5a1b613200 RCX: 0000000000000000
RDX: 0000000000000007 RSI: ffffffff8de9d696 RDI: 00000000ffffffff
RBP: ffff8e5a0ca729a0 R08: 0000000000000001 R09: 00000000000005aa
R10: 0000000000000004 R11: 0000000000000000 R12: ffff8e5a1b6132e8
R13: 0000000000000000 R14: 0000000000000170 R15: ffff8e5a0c69e650
FS: 0000000000000000(0000) GS:ffff8e5a1f800000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f4f26530480 CR3: 00000001f0a0a006 CR4: 00000000000606f0
Call Trace:
amdgpu_vram_mgr_fini+0x27/0x50 [amdgpu]
ttm_bo_clean_mm+0xa9/0xb0 [ttm]
amdgpu_ttm_fini+0x71/0x100 [amdgpu]
amdgpu_bo_fini+0xe/0x30 [amdgpu]
gmc_v7_0_sw_fini+0x32/0x60 [amdgpu]
amdgpu_device_fini+0x2cc/0x4aa [amdgpu]
amdgpu_driver_unload_kms+0x42/0x90 [amdgpu]
amdgpu_driver_load_kms+0x168/0x2c0 [amdgpu]
drm_dev_register+0x109/0x140 [drm]
amdgpu_pci_probe+0x13c/0x1c0 [amdgpu]
? _raw_spin_unlock_irqrestore+0x20/0x40
local_pci_probe+0x41/0x90
work_for_cpu_fn+0x16/0x20
process_one_work+0x1eb/0x410
worker_thread+0x218/0x3d0
? process_one_work+0x410/0x410
kthread+0x112/0x130
? kthread_park+0x80/0x80
ret_from_fork+0x35/0x40
---[ end trace 3cf1bcf02bf4fe1a ]---
Attachment 142499, "dmesg (journalctl) of failure on 4.19.2.arch1-1":
failure-4.19.2.arch1-1e6672ccc86f408abe88eac751c19406