Due to an influx of spam, we have had to impose restrictions on new accounts. Please see this wiki page for instructions on how to get full permissions. Sorry for the inconvenience.
Admin message
The migration is almost done, at least the rest should happen in the background. There are still a few technical difference between the old cluster and the new ones, and they are summarized in this issue. Please pay attention to the TL:DR at the end of the comment.
following up on #25 (comment 2156044), I'm opening a new bug to track remaining / new issues.
On Linux 6.9.2, suspending and resuming a lazor sc7180 while external display is connected over USB-C DP yields [drm:drm_mode_config_helper_resume] *ERROR* Failed to resume (-22) (similar to the -107 error discussed at #25 (closed)). The system recovers relatively quickly from the error and the external display turns up normally (the issue is mainly perceptible by looking at the kernel logs or by noticing the slight delay in recovery).
dmesg snippet:
% grep drm dmesg-6.9.2[ 0.342052] [drm] Initialized vgem 1.0.0 20120112 for vgem on minor 0[ 3.914356] aux_bridge.aux_bridge aux_bridge.aux_bridge.0: error -ENODEV: failed to acquire drm_bridge[ 4.248733] [drm:dpu_kms_hw_init:1053] dpu hardware revision:0x60020000[ 4.257815] [drm] Initialized msm 1.12.0 20130625 for ae01000.display-controller on minor 1[ 4.267800] msm_dpu ae01000.display-controller: [drm:adreno_request_fw] loaded qcom/a630_sqe.fw from new location[ 4.278415] msm_dpu ae01000.display-controller: [drm:adreno_request_fw] loaded qcom/a630_gmu.bin from new location[ 4.468890] msm_dpu ae01000.display-controller: [drm] fb0: msmdrmfb frame buffer device[ 73.514009] systemd[1]: Starting modprobe@drm.service - Load Kernel Module drm...[ 73.921409] systemd[1]: modprobe@drm.service: Deactivated successfully.[ 73.937414] systemd[1]: Finished modprobe@drm.service - Load Kernel Module drm.[ 79.697997] [drm:dp_aux_isr] *ERROR* Unexpected DP AUX IRQ 0x01000000 when not busy[ 90.966379] [drm:_dpu_rm_check_lm_and_get_connected_blks] [dpu error]failed to get dspp on lm 0[ 90.975354] [drm:_dpu_rm_make_reservation] [dpu error]unable to find appropriate mixers[ 90.983592] [drm:dpu_rm_reserve] [dpu error]failed to reserve hw resources: -119[ 1185.831984] [drm:drm_mode_config_helper_resume] *ERROR* Failed to resume (-22)[ 1190.140545] [drm:dp_aux_isr] *ERROR* Unexpected DP AUX IRQ 0x01000000 when not busy
[ 1185.831970] [dpu error]connector not connected 3 [ 1185.831984] [drm:drm_mode_config_helper_resume] *ERROR* Failed to resume (-22) [ 1186.025863] OOM killer enabled. [ 1186.029095] Restarting tasks ... done. [ 1186.042834] random: crng reseeded on system resumption [ 1186.048738] PM: suspend exit [ 1187.633591] ath10k_snoc 18800000.wifi: chan info: invalid frequency 0 (idx 41 out of bounds) [ 1188.194232] systemd-journald[682]: Under memory pressure, flushing caches. [ 1190.140545] [drm:dp_aux_isr] *ERROR* Unexpected DP AUX IRQ 0x01000000 when not busy
@abhinavk, since a few stable releases, I notice sc7180 lazor randomly hard resets (ie. reboots suddenly). I suspect it's related to drm/msm, as it only happens with graphical user interface enabled. I need to try running a few older kernel versions for a longer period of time to establish which kernel first introduced the issue. If any of the other drm errors mentioned in the snippet above look related to you, please let me know.
cc @jesszhan@abhinavk regarding this regression. gitlab.freedesktop.org was unavailable for the past week (?) and I'm thus not sure the notification was delivered when opening the issue
@lumag there could be a confounder, in that starting 41c177cf (which includes merge of 6.8.0-rc3), suspend is not working as expected on the msm-next branch. Specifically, the EC detects a sleep transition timeout and "wakes" the device back up/brings screens back up after a few seconds. But I validated that 41c177cf does not exhibit the error, that d13f638c doesn't either, but then starting 71174f36[drm:drm_mode_config_helper_resume] *ERROR* Failed to resume (-22) is printed. Please let me know if you have suggestions how to better narrow down the issue.
@leezu could you please set drm.debug=0x16 and capture the log during suspend? Also could you please post /sys/kernel/debug/dri/0/state before suspending?
I also validated that by merging v6.8-rc4 the "EC detects a sleep transition timeout" issue mentioned a potential confounder above is resolved. Ie. suspend works correctly. Even with merging v6.8-rc4, no [drm:drm_mode_config_helper_resume] *ERROR* Failed to resume (-22) on d13f638c but it is printed on 71174f36.
/sys/kernel/debug/dri/0/state does not exist on my system. Do I need a kernel configuration to make it available?
@leezu , sorry for the late response. I was indeed not getting some of the messages from this bug and I was OOO last week. I will go through the logs and update.
@leezu could you please check your issue with these two patches applied? If they don't help, I'd kindly ask for a new debug log (some of the DPU messages were being directed to other facility, the second patch fixes that) and a state from debugfs.
@lumag thank you. 0001-drm-msm-dpu1-don-t-choke-on-disabling-the-writeback-.patch works and avoids [drm:drm_mode_config_helper_resume] *ERROR* Failed to resume (-22). drm.debug=0x16 log: drm_debug-err-22-fixed.txt But is it the right fix? As in, are there valid situations where state->crtc will be null but this function is called? Or do any such situations indicate a bug, where crtc state is not properly preserved? For example, should this case only apply if the system was suspended without external display connected, and then resumed with the display connected?
A few more observations:
CRTC state isn't being restored correctly after resume (and this was true for at least kernel 6.8 as well). After resume, Color Transform Matrix (CTM) property appears lost. If internal panel/screen used CTM for Night Light, it no longer is active after resume. Toggling Night Light on and off after resume, leads to a number of the following errors:
[drm:_dpu_rm_check_lm_and_get_connected_blks] [dpu error]failed to get dspp on lm 0[drm:_dpu_rm_make_reservation] [dpu error]unable to find appropriate mixers[drm:dpu_rm_reserve] [dpu error]failed to reserve hw resources: -119
If device is suspended and no external display connected, then pressing any key on the keyboard will wake up the device AND that key will passed along as input to the screen-lock (ie. as the first character for entering users password). If external display is connected, then pressing any key will will wake up the device, but the key will not be passed on to the screen-lock. (In principle, it's possible this is an issue with the screen-lock and not the kernel)
@leezu yes, you need to enable debugfs and to mount it to /sys/kernel/debug for that file to be available
Can you share more details? I do see clients, gem_names and name at /sys/kernel/debug/dri/0/. Just not state
For 1, please start a separate issue. Please don't reuse existing tickets for unrelated problems. Please use linux-next + two patches I have posted earlier to capture the log.
For 2, this doesn't look like a kernel issue unless proven otherwise. I'd guess it might be related to how your display manager handles seats and/or input devices.
Regarding debugfs/dri/0/state. What is the contents of /sys/kernel/debug/dri/0/name? Do you have any other dirs under /sys/kernel/debug/dri? The msm driver is an atomic KMS driver so it should create dri/N/state, dri/N/framebuffer, dri/N/internal_clients and several other files.
For 1, please start a separate issue. Please don't reuse existing tickets for unrelated problems. Please use linux-next + two patches I have posted earlier to capture the log.
After additional testing, it seems it is not an unrelated issue but triggered by your patch. The issue does not occur on Linux 6.9.6, but does occur on 6.9.8+dpu1-don-t-choke-on-disabling-the-writeback_ctm-resume patch. I think for my "and this was true for at least kernel 6.8 as well" I was referring to the kernel built from 71174f36 with 6.8.0-rc4 merged and your patch applied.
Let me build linux-next with your patches as well and attach the debug logs.
Regarding debugfs/dri/0/state. What is the contents of /sys/kernel/debug/dri/0/name? Do you have any other dirs under /sys/kernel/debug/dri? The msm driver is an atomic KMS driver so it should create dri/N/state, dri/N/framebuffer, dri/N/internal_clients and several other files.
@lumag, please see attached the dmesg debug logs with linux-next next-20240709. Once with your patches, once without. The CTM issue only occurs with your patch. For each file, I start recording with Night Light enabled, then 1/ suspend and resume the system 2/ toggle "Night Light" (CTM use) off and on again. With your patch, step 2 does not result in visible behavior. I also notice slight screen flickering on the internal display when moving the mouse on the internal display with your patch.