Call trace on each boot both in 'dc_link_set_backlight_level' and 'rn_clk_mgr_construct'
Brief summary of the problem:
- Trace 1: at each boot found in dmesg triggered by systemd-backlight (only amdgpu.runpm=0)
- Trace 2: at each boot found in dmesg when systemd-backlight is disabled (by systemd.restore_state=0)
Hardware description:
- CPU: AMD Ryzen 7 4800H
- GPU: Integrated AMD Vega 7 & discrete AMD Navi 14 Radeon RX 5500M
- System Memory: 2x 8GB DDR4-3200
- Display(s): 1920x1080 @ 144Hz
- Type of Display Connection: eDP
- Laptop model: MSI Bravo 17 - A4DDR-035NL
System information:
- Distro name and Version: Arch Linux
- Kernel version: 5.8.7, 5.8.8, 5.9.0-rc4
- Using kernel parameters
- amdgpu.runpm=0 (if not, immediate kernel crash)
- systemd.restore_state=0 (to prevent systemd-backlight call on boot).
- Using Plymouth
How to reproduce the issue:
- Boot system
- Login
- Open terminal
- dmesg
Attached files:
- Dmesg 1: dmesg-with-systemd-backlight.txt
- Dmesg 2: dmesg-no-systemd-backlight.txt
Trace 1: at each boot found in dmesg triggered by systemd-backlight
[ 4.621337] [drm] Initialized amdgpu 3.38.0 20150101 for 0000:07:00.0 on minor 1
[ 4.640077] ------------[ cut here ]------------
[ 4.640156] WARNING: CPU: 0 PID: 953 at drivers/gpu/drm/amd/amdgpu/../display/dc/core/dc_link.c:2545 dc_link_set_backlight_level+0x8a/0xf0 [amdgpu]
[ 4.640156] Modules linked in: snd_acp3x_pdm_dma snd_acp3x_rn snd_soc_dmic snd_soc_core snd_compress ac97_bus snd_pcm_dmaengine snd_rn_pci_acp3x snd_pci_acp3x btusb btrtl btbcm btintel bluetooth ecdh_generic ecc iwlmvm amdgpu mac80211 joydev snd_hda_codec_realtek mousedev libarc4 snd_hda_codec_generic ledtrig_audio iwlwifi snd_hda_codec_hdmi hid_multitouch snd_hda_intel hid_generic snd_intel_dspcfg snd_hda_codec gpu_sched msi_wmi i2c_algo_bit sparse_keymap edac_mce_amd ttm snd_hda_core cfg80211 kvm_amd snd_hwdep snd_pcm drm_kms_helper kvm r8169 snd_timer cec realtek rc_core syscopyarea snd sp5100_tco irqbypass sysfillrect psmouse sysimgblt rapl input_leds fb_sys_fops pcspkr k10temp soundcore libphy i2c_piix4 rfkill tpm_crb wmi battery i2c_hid ac tpm_tis tpm_tis_core hid tpm pinctrl_amd uvcvideo acpi_cpufreq videobuf2_vmalloc videobuf2_memops evdev videobuf2_v4l2 soc_button_array mac_hid videobuf2_common videodev mc vboxnetflt(OE) vboxnetadp(OE) vboxdrv(OE) dm_mod drm sg crypto_user agpgart
[ 4.640182] ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 serio_raw atkbd libps2 crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel crypto_simd cryptd glue_helper xhci_pci xhci_pci_renesas ccp xhci_hcd rng_core i8042 serio
[ 4.640190] CPU: 0 PID: 953 Comm: systemd-backlig Tainted: G OE 5.8.7-arch1-1 #1
[ 4.640191] Hardware name: Micro-Star International Co., Ltd. Bravo 17 A4DDR/MS-17FK, BIOS E17FKAMS.116 07/10/2020
[ 4.640240] RIP: 0010:dc_link_set_backlight_level+0x8a/0xf0 [amdgpu]
[ 4.640241] Code: 30 03 00 00 31 c0 48 8d 96 c0 01 00 00 48 8b 0a 48 85 c9 74 06 48 3b 59 08 74 20 83 c0 01 48 81 c2 c8 04 00 00 83 f8 06 75 e3 <0f> 0b 45 31 e4 5b 44 89 e0 5d 41 5c 41 5d 41 5e c3 48 98 48 69 c0
[ 4.640242] RSP: 0018:ffffadc080807df0 EFLAGS: 00010246
[ 4.640243] RAX: 0000000000000006 RBX: ffff9cd5438c9800 RCX: 0000000000000000
[ 4.640243] RDX: ffff9cd53f801e70 RSI: ffff9cd53f800000 RDI: 0000000000000000
[ 4.640244] RBP: ffff9cd543900000 R08: 00000000000000ff R09: 000000000000000a
[ 4.640244] R10: 000000000000000a R11: f000000000000000 R12: 000000000000ff01
[ 4.640245] R13: 0000000000000000 R14: 000000000000ffff R15: ffff9cd549120260
[ 4.640246] FS: 00007f5b331a8000(0000) GS:ffff9cd55f600000(0000) knlGS:0000000000000000
[ 4.640246] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 4.640247] CR2: 00005598681fe978 CR3: 0000000408e64000 CR4: 0000000000340ef0
[ 4.640247] Call Trace:
[ 4.640302] amdgpu_dm_backlight_update_status+0xb4/0xc0 [amdgpu]
[ 4.640321] backlight_device_set_brightness+0x7e/0x130
[ 4.640323] brightness_store+0x63/0x80
[ 4.640326] kernfs_fop_write+0xce/0x1b0
[ 4.640329] vfs_write+0xc7/0x1f0
[ 4.640331] ksys_write+0x67/0xe0
[ 4.640335] do_syscall_64+0x44/0x70
[ 4.640337] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 4.640339] RIP: 0033:0x7f5b33fc0f67
[ 4.640341] Code: 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
[ 4.640342] RSP: 002b:00007ffc766ac878 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 4.640343] RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007f5b33fc0f67
[ 4.640343] RDX: 0000000000000004 RSI: 00007ffc766ac960 RDI: 0000000000000004
[ 4.640344] RBP: 00007ffc766ac960 R08: 0000000000000000 R09: 0000000000000000
[ 4.640344] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000004
[ 4.640344] R13: 00005598681e83c0 R14: 0000000000000004 R15: 00007f5b34093720
[ 4.640346] ---[ end trace de75e01f35cca025 ]---
Trace 2: at each boot found in dmesg when systemd-backlight is disabled by kernel parameter systemd.restore_state=0
[ 4.694971] amdgpu 0000:07:00.0: amdgpu: SMU is initialized successfully!
[ 4.696369] [drm] kiq ring mec 2 pipe 1 q 0
[ 4.697106] ------------[ cut here ]------------
[ 4.697329] WARNING: CPU: 10 PID: 398 at drivers/gpu/drm/amd/amdgpu/../display/dc/clk_mgr/dcn21/rn_clk_mgr.c:716 rn_clk_mgr_construct+0x142/0x3f0 [amdgpu]
[ 4.697330] Modules linked in: snd_pci_acp3x btusb btrtl btbcm btintel bluetooth ecdh_generic iwlmvm ecc snd_hda_codec_realtek joydev amdgpu(+) mousedev snd_hda_codec_generic mac80211 ledtrig_audio snd_hda_codec_hdmi libarc4 snd_hda_intel edac_mce_amd gpu_sched snd_intel_dspcfg i2c_algo_bit kvm_amd snd_hda_codec ttm hid_multitouch snd_hda_core r8169 hid_generic msi_wmi iwlwifi sparse_keymap kvm drm_kms_helper snd_hwdep realtek snd_pcm cec irqbypass mdio_devres rc_core of_mdio cfg80211 psmouse rapl snd_timer input_leds fixed_phy syscopyarea pcspkr sp5100_tco snd sysfillrect libphy tpm_crb sysimgblt rfkill k10temp i2c_piix4 uvcvideo ac wmi battery tpm_tis soundcore fb_sys_fops tpm_tis_core videobuf2_vmalloc i2c_hid videobuf2_memops tpm videobuf2_v4l2 hid pinctrl_amd videobuf2_common videodev soc_button_array evdev mc acpi_cpufreq mac_hid drm sg dm_mod crypto_user agpgart ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 serio_raw atkbd libps2 crct10dif_pclmul crc32_pclmul
[ 4.697386] crc32c_intel ghash_clmulni_intel aesni_intel crypto_simd cryptd glue_helper xhci_pci xhci_hcd ccp rng_core i8042 serio
[ 4.697398] CPU: 10 PID: 398 Comm: systemd-udevd Not tainted 5.9.0-rc4-1-git-00038-g581cb3a26baf #1
[ 4.697400] Hardware name: Micro-Star International Co., Ltd. Bravo 17 A4DDR/MS-17FK, BIOS E17FKAMS.116 07/10/2020
[ 4.697608] RIP: 0010:rn_clk_mgr_construct+0x142/0x3f0 [amdgpu]
[ 4.697613] Code: 00 00 00 41 8b 8c c4 80 00 00 00 41 89 c1 89 c7 85 c9 74 10 41 8b 94 c4 84 00 00 00 85 d2 0f 85 aa 01 00 00 48 83 e8 01 73 d9 <0f> 0b 83 7b 20 01 74 0c 81 bd e8 00 00 00 ff 14 37 00 7f 27 48 8b
[ 4.697615] RSP: 0018:ffffa92c4286b6c0 EFLAGS: 00010297
[ 4.697617] RAX: ffffffffffffffff RBX: ffff973a7e874180 RCX: 0000000000000000
[ 4.697619] RDX: ffff973a92ac1e80 RSI: ffffa92c4286b6e8 RDI: 0000000000000000
[ 4.697620] RBP: ffff973a98469e00 R08: 0000000000000000 R09: 0000000000000000
[ 4.697622] R10: 7fc9117fffffffff R11: ffff973a7e88f400 R12: ffffa92c4286b6e8
[ 4.697623] R13: ffff973a7e874e40 R14: ffff973a8a190000 R15: ffff973a7e874180
[ 4.697625] FS: 00007f705a9e9440(0000) GS:ffff973a9f680000(0000) knlGS:0000000000000000
[ 4.697627] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 4.697628] CR2: 000055de9eb7a774 CR3: 0000000419700000 CR4: 0000000000350ee0
[ 4.697630] Call Trace:
[ 4.697842] dc_clk_mgr_create+0x172/0x1b0 [amdgpu]
[ 4.698042] dc_create+0x24a/0x7a0 [amdgpu]
[ 4.698050] ? kmem_cache_alloc_trace+0x106/0x240
[ 4.698255] amdgpu_dm_init.isra.0+0x17f/0x1e0 [amdgpu]
[ 4.698460] dm_hw_init+0xe/0x20 [amdgpu]
[ 4.698666] amdgpu_device_init.cold+0x171a/0x19d8 [amdgpu]
[ 4.698827] amdgpu_driver_load_kms+0x5c/0x230 [amdgpu]
[ 4.698984] amdgpu_pci_probe+0xf4/0x180 [amdgpu]
[ 4.698991] local_pci_probe+0x42/0x80
[ 4.698995] ? pci_match_device+0xd7/0x100
[ 4.698998] pci_device_probe+0xfa/0x1b0
[ 4.699002] really_probe+0x205/0x460
[ 4.699005] driver_probe_device+0xe1/0x150
[ 4.699008] device_driver_attach+0xa1/0xb0
[ 4.699011] __driver_attach+0x8a/0x150
[ 4.699012] ? device_driver_attach+0xb0/0xb0
[ 4.699014] ? device_driver_attach+0xb0/0xb0
[ 4.699017] bus_for_each_dev+0x89/0xd0
[ 4.699021] bus_add_driver+0x12b/0x1e0
[ 4.699024] driver_register+0x8b/0xe0
[ 4.699026] ? 0xffffffffc0fb6000
[ 4.699030] do_one_initcall+0x59/0x234
[ 4.699036] do_init_module+0x5c/0x260
[ 4.699039] load_module+0x21a7/0x2450
[ 4.699046] __do_sys_init_module+0x12d/0x180
[ 4.699053] do_syscall_64+0x33/0x40
[ 4.699057] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 4.699059] RIP: 0033:0x7f705b79ae4e
[ 4.699063] Code: 48 8b 0d 25 10 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 af 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d f2 0f 0c 00 f7 d8 64 89 01 48
[ 4.699064] RSP: 002b:00007ffee9ca2528 EFLAGS: 00000246 ORIG_RAX: 00000000000000af
[ 4.699067] RAX: ffffffffffffffda RBX: 00005624b5c38250 RCX: 00007f705b79ae4e
[ 4.699068] RDX: 00005624b5c38640 RSI: 0000000000a5a2a1 RDI: 00005624b65430e0
[ 4.699069] RBP: 00005624b65430e0 R08: ffffffffffffffe0 R09: 00007ffee9ca0671
[ 4.699070] R10: 00005624b5a34010 R11: 0000000000000246 R12: 00005624b5c38640
[ 4.699071] R13: 0000000000000008 R14: 00005624b5c2df00 R15: 00005624b5c38250
[ 4.699076] ---[ end trace 3e1ef6f5f1a6a9c8 ]---
[ 4.699158] [drm] Display Core initialized with v3.2.95!
I found a patch in an AMD Display Core v3.2.102 patchset, which I hoped would solve OOPS 2.
https://lists.freedesktop.org/archives/amd-gfx/2020-September/053625.html
The patch of which I hoped it might prevent the call trace is this following.
https://lists.freedesktop.org/archives/amd-gfx/2020-September/053633.html
I have applied and tested the patch. Unfortunately it did not resolve the problem.
Apparently the ASSERT(0) in this piece of code is triggering Trace 2.
drivers/gpu/drm/amd/display/dc/clk_mgr/dcn21/rn_clk_mgr.c
rn_clk_mgr_helper_populate_bw_params
/* Find lowest DPM, FCLK is filled in reverse order*/
for (i = PP_SMU_NUM_FCLK_DPM_LEVELS - 1; i >= 0; i--) {
if (clock_table->FClocks[i].Freq != 0 && clock_table->FClocks[i].Vol != 0) {
j = i;
break;
}
}
if (j == -1) {
/* clock table is all 0s, just use our own hardcode */
ASSERT(0);
return;
}