Due to an influx of spam, we have had to impose restrictions on new accounts. Please see this wiki page for instructions on how to get full permissions. Sorry for the inconvenience.
Admin message
The migration is almost done, at least the rest should happen in the background. There are still a few technical difference between the old cluster and the new ones, and they are summarized in this issue. Please pay attention to the TL:DR at the end of the comment.
Rembrandt: Backlight turns off automatically and LCD is still working
Hello!
I'm not able to comment on #2935 (closed). Probably to false positive spam protection:
Your comment could not be submitted because your comment has been recognized as spam. please, change the content to proceed..
Any attempt to bypass it failed. I'm sorry if I'm creating even more noise.
Brief summary of the problem:
When launching a game like Counter-Strike 2 or OpenRA there is a chance of 10% - 20 % that the backlight turns off. The only workaround I'm aware to get it on again is a suspend and resume i.e. Fn+4 and Fn.
Hardware description:
Laptop: ThinkPad X13 Gen3
CPU: AMD Ryzen™ 7 PRO 6850U
GPU: Rembrandt [Radeon 680M]
System Memory: 16 GB 32 GB (# edit: new mainboard fresh from Lenovo, 2023-12-27)
Display(s): AU Optronics, 0xa79d (HiDPI 2560x1600)
Type of Display Connection: eDP
System information:
Distro name and Version: Archlinux
Kernel version: 6.5.9 (# edit: 6.6.8, still happening)
How to reproduce the issue:
Launch a game which causes some kind of workload on the GPU
Repeat until the backlight goes off (requires probably ten or more tries, if occurred once less likely)
Further affected users -> my original intent was adding that list to #2935 (closed)
Please not that the time seems to jump here. The man page of dmesg mentions the limitation regarding suspend and resume:
Be aware that the timestamp could be inaccurate! The time source used for the logs is not updated after system SUSPEND/RESUME. Timestamps are adjusted according to current delta between boottime and monotonic clocks, this works only for messages printed after last resume.
For this issue specific to Rembrandt and you can still reproduce please upgrade to the latest DMCUB if you have an older one. The current version is 0x4000044. You'll see it in your logs.
Please also upgrade to the latest stable kernel, there have been various changes for PSR-SU policy (For example d16df040c8dad25c962b4404d2d534bfea327c6a). If this can still be reproduced I think we'll need a DMCUB trace.
Thank you Mario. I'm testing it now and will report back in a few days.
[ 2.567649] [drm] Loading DMUB firmware via PSP: version=0x04000044
Running with Linux 6.9.3 which should contain the commit mentioned above for some time now. Maybe I'm a little special because my HiDPI-Panel isn't often sold by Lenovo?
But everything kept working. At 18:50 I've executed fwupdmgr get-devices (nothing special?) and the backlight turned off some seconds later when I switched between windows with Alt+Tab. Therefore I closed the LID and opened the LID, the backlight turned on again.
Please note
I've turned off suspend on LID close.
/etc/systemd/logind.conf
# don't suspend on lid closeHandleLidSwitch=ignore
@hoschi_nullptr are you sure that it's tied to running fwupdmgr get-devices? That's really interesting if so. There are fwupd plugins that will try to use an aux channel to communicate with connected DP devices.
Can you reproduce again with the same symptom? What version of fwupd?
If you are sure it's fwupd caused it can you sudo systemctl mask fwupd.service and then reboot and see if it comes back again like that?
No. I don’t think fwupd is the culprit. It happened this time right after switching the windows from the terminal to the web-browser.
I happened previously when playing Counter-Strike 2 for some time. Or when starting Counter-Strike 2. Or when browsing the web or some other seemingly random task.
I would suspect some state change by the GPU or similar? But I’ve no clue. The only hint is the error message from above which is printed at some random time. This time after boot, more often hours after.
Just to make sure it's not fwupd can you mask the service? It can start up on it's own in the background too for refreshing metadata once a day. If you collect a journal (specifically; not just a kernel log) you can check if anything else was activated at the same time it happens too.
I masked the service during the weekend and suspended the laptop.
systemctl status fwupd.service ○ fwupd.service Loaded: masked (Reason: Unit fwupd.service is masked.) Active: inactive (dead)
Today opened the display and the laptop resumed. Opened Evolution, opened Epiphany, started to login on a website and seconds later the backlight turned off. I've suspended and resumed again to get the backlight on again.
This time the kernel logged a trace right a the time were the backlight turned off:
[Wed Oct 16 12:37:09 2024] ------------[ cut here ]------------[Wed Oct 16 12:37:09 2024] WARNING: CPU: 5 PID: 264 at drivers/gpu/drm/amd/amdgpu/../display/dc/dce/dmub_psr.c:221 dmub_psr_enable+0xfd/0x110 [amdgpu][Wed Oct 16 12:37:09 2024] Modules linked in: ccm michael_mic usbhid snd_seq_dummy rfcomm snd_hrtimer snd_seq snd_seq_device qrtr_mhi uhid cmac algif_hash algif_skcipher af_alg bnep vfat fat snd_soc_dmic snd_acp6x_pdm_dma snd_soc_acp6x_mach snd_sof_amd_acp63 snd_sof_amd_vangogh snd_sof_amd_rembrandt snd_sof_amd_renoir snd_sof_amd_acp snd_sof_pci snd_sof_xtensa_dsp qrtr snd_sof ath11k_pci snd_sof_utils snd_pci_ps snd_ctl_led snd_amd_sdw_acpi ath11k soundwire_amd amd_atl intel_rapl_msr snd_hda_codec_realtek soundwire_generic_allocation intel_rapl_common soundwire_bus snd_hda_codec_generic qmi_helpers joydev mousedev snd_hda_scodec_component snd_hda_codec_hdmi snd_soc_core snd_compress mac80211 ac97_bus snd_hda_intel uvcvideo snd_intel_dspcfg snd_pcm_dmaengine snd_intel_sdw_acpi snd_rpl_pci_acp6x videobuf2_vmalloc crct10dif_pclmul snd_acp_pci libarc4 uvc crc32_pclmul snd_hda_codec btusb spd5118 videobuf2_memops snd_acp_legacy_common polyval_clmulni sp5100_tco snd_pci_acp6x videobuf2_v4l2 polyval_generic btrtl snd_hda_core[Wed Oct 16 12:37:09 2024] hid_multitouch snd_pci_acp5x cfg80211 ghash_clmulni_intel btintel snd_hwdep videodev snd_rn_pci_acp3x hid_generic sha512_ssse3 ucsi_acpi snd_acp_config btbcm sha256_ssse3 snd_pcm typec_ucsi i2c_piix4 videobuf2_common btmtk think_lmi sha1_ssse3 snd_soc_acpi aesni_intel gf128mul crypto_simd cryptd bluetooth mc psmouse rapl pcspkr firmware_attributes_class wmi_bmof typec thunderbolt mhi snd_pci_acp3x snd_timer ccp k10temp i2c_smbus roles i2c_hid_acpi i2c_hid amd_pmc acpi_tad mac_hid sg crypto_user dm_mod loop nfnetlink ip_tables x_tables ext4 crc32c_generic mbcache jbd2 amdgpu amdxcp i2c_algo_bit drm_ttm_helper ttm drm_exec serio_raw thinkpad_acpi gpu_sched atkbd sparse_keymap drm_suballoc_helper libps2 platform_profile vivaldi_fmap drm_buddy snd nvme soundcore drm_display_helper rfkill nvme_core crc32c_intel cec xhci_pci video i8042 crc16 xhci_pci_renesas nvme_auth serio wmi[Wed Oct 16 12:37:09 2024] CPU: 5 UID: 0 PID: 264 Comm: kworker/5:1H Not tainted 6.11.3-arch1-1 #1 1400000003000000474e55000681d53aa6c7b79b[Wed Oct 16 12:37:09 2024] Hardware name: LENOVO 21CMCTO1WW/21CMCTO1WW, BIOS R22ET70W (1.40 ) 03/21/2024[Wed Oct 16 12:37:09 2024] Workqueue: events_highpri dm_irq_work_func [amdgpu][Wed Oct 16 12:37:09 2024] RIP: 0010:dmub_psr_enable+0xfd/0x110 [amdgpu][Wed Oct 16 12:37:09 2024] Code: d1 81 fb e8 03 00 00 74 21 48 8b 44 24 48 65 48 2b 04 25 28 00 00 00 75 15 48 83 c4 50 5b 5d 41 5c 41 5d 41 5e e9 2e 5f 8b ca <0f> 0b eb db e8 aa 8c 61 ca 66 2e 0f 1f 84 00 00 00 00 00 90 90 90[Wed Oct 16 12:37:09 2024] RSP: 0018:ffffaeba843d7cc8 EFLAGS: 00010246[Wed Oct 16 12:37:09 2024] RAX: 000002f35afbe42c RBX: 00000000000003e9 RCX: 0000000000000005[Wed Oct 16 12:37:09 2024] RDX: 000000000014962b RSI: 0000000000148f27 RDI: 000002f35ae74e01[Wed Oct 16 12:37:09 2024] RBP: 0000000000000000 R08: 0000000000000002 R09: ffff9735847a6980[Wed Oct 16 12:37:09 2024] R10: 0000000000000007 R11: 0000000000000001 R12: ffff97359544c4d0[Wed Oct 16 12:37:09 2024] R13: 0000000000000000 R14: ffffaeba843d7ccc R15: 0000000000000000[Wed Oct 16 12:37:09 2024] FS: 0000000000000000(0000) GS:ffff973c9ec80000(0000) knlGS:0000000000000000[Wed Oct 16 12:37:09 2024] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033[Wed Oct 16 12:37:09 2024] CR2: 000076f15798dfd0 CR3: 00000002cd022000 CR4: 0000000000f50ef0[Wed Oct 16 12:37:09 2024] PKRU: 55555554[Wed Oct 16 12:37:09 2024] Call Trace:[Wed Oct 16 12:37:09 2024] <TASK>[Wed Oct 16 12:37:09 2024] ? dmub_psr_enable+0xfd/0x110 [amdgpu 1400000003000000474e5500a38fdd2dd8475a01][Wed Oct 16 12:37:09 2024] ? __warn.cold+0x8e/0xe8[Wed Oct 16 12:37:09 2024] ? dmub_psr_enable+0xfd/0x110 [amdgpu 1400000003000000474e5500a38fdd2dd8475a01][Wed Oct 16 12:37:09 2024] ? report_bug+0xff/0x140[Wed Oct 16 12:37:09 2024] ? handle_bug+0x3c/0x80[Wed Oct 16 12:37:09 2024] ? exc_invalid_op+0x17/0x70[Wed Oct 16 12:37:09 2024] ? asm_exc_invalid_op+0x1a/0x20[Wed Oct 16 12:37:09 2024] ? dmub_psr_enable+0xfd/0x110 [amdgpu 1400000003000000474e5500a38fdd2dd8475a01][Wed Oct 16 12:37:09 2024] ? dmub_psr_enable+0xb2/0x110 [amdgpu 1400000003000000474e5500a38fdd2dd8475a01][Wed Oct 16 12:37:09 2024] edp_set_psr_allow_active+0x280/0x3b0 [amdgpu 1400000003000000474e5500a38fdd2dd8475a01][Wed Oct 16 12:37:09 2024] dp_handle_hpd_rx_irq+0x4dd/0x510 [amdgpu 1400000003000000474e5500a38fdd2dd8475a01][Wed Oct 16 12:37:09 2024] handle_hpd_rx_irq+0xd9/0x2e0 [amdgpu 1400000003000000474e5500a38fdd2dd8475a01][Wed Oct 16 12:37:09 2024] process_one_work+0x17e/0x330[Wed Oct 16 12:37:09 2024] worker_thread+0x2ce/0x3f0[Wed Oct 16 12:37:09 2024] ? __pfx_worker_thread+0x10/0x10[Wed Oct 16 12:37:09 2024] kthread+0xd2/0x100[Wed Oct 16 12:37:09 2024] ? __pfx_kthread+0x10/0x10[Wed Oct 16 12:37:09 2024] ret_from_fork+0x34/0x50[Wed Oct 16 12:37:09 2024] ? __pfx_kthread+0x10/0x10[Wed Oct 16 12:37:09 2024] ret_from_fork_asm+0x1a/0x30[Wed Oct 16 12:37:09 2024] </TASK>[Wed Oct 16 12:37:09 2024] ---[ end trace 0000000000000000 ]---
We cannot trust dmesg -T according to the man page (doesn't support suspend/resume). This message was printed a day before (Tuesday) but my display backlight went off today (Wednesday).
So this message is actually some kind of precursor?
Can you confirm using amdgpu.dcdebugmask=0x10 can help your issue? There will be some power consumption impact, but maybe we do need to quirk your machine if PSR is still leading to problems.
May I dare to add another question?
This is the output of psr.py on my machine without amdgpu.dcdebugmask=0x10:
./psr.py
DRI device 1 DMCUB F/W version: 0x04000044○ PSR 2 with Y coordinates (eDP 1.4a) [3]○ Sink OUI: Parade○ resv_40f: 01○ ID String: 06-96○ PSR Status: 00-00-02
Is guess there is something special with my AU Optronics, 0xa79d (HiDPI 2560x1600). Then I read again your comment about d16df040c8dad25c962b4404d2d534bfea327c6a which says:
We have observed that there are quite a number of PSR-SU panels on themarket that are unable to keep up with what user space throws at them,resulting in hangs and random black screens. So, make damage clipssupport configurable and disable it by default for PSR-SU displays.
How I can read the content of the variable is_psr_su? Does it maybe contain not expected value?
You could try to instead use amdgpu.dcdebugmask=0x200 which will explicitly turn off PSR selective update instead of all of PSR but I'm not sure that will really change much if you haven't changed damage clips.
Somehow, I would be fine with PSR-SU turned off by default. I need my own Unicode plane to express my emotions about that. But I would be fine
I can bisect, but my time is limited to weekends. Considering that I need four to eight hours for one test, I can test one or two patches per week. Would it be possible to limit the number of patches to some interesting patches in gpu/drm/amd/${SOME}?
This is printed with 6.13.5 to dmesg when PSR-SU fails and turns the backlight off. Or it is printed to dmesg some time after, because I need to suspend and resume to read the screen.
The message is printed between the suspend and resume messages. But as far as I know timestamps around suspend and resume aren't reliable.
@superm1 Would be nice if you close this issue when PSR-SU is actually turned off by default.