Due to an influx of spam, we have had to impose restrictions on new accounts. Please see this wiki page for instructions on how to get full permissions. Sorry for the inconvenience.
Admin message
The migration is almost done, at least the rest should happen in the background. There are still a few technical difference between the old cluster and the new ones, and they are summarized in this issue. Please pay attention to the TL:DR at the end of the comment.
[drm:drm_mode_config_helper_resume] *ERROR* Failed to resume (-107)
On Linux 6.3.3 and also on 6.4.0-rc4 respectively with https://patchwork.freedesktop.org/patch/538937/ applied, suspending and resuming a lazor sc7180 while external display is connected over USB-C DP yields [drm:drm_mode_config_helper_resume] *ERROR* Failed to resume (-107). The system recovers quickly from the error and the external display turns up normally (the issue is only perceptible by looking at the kernel logs). Without the patch, USB-C DP is completely broken on lazor as discussed at https://lore.kernel.org/lkml/ebbcd56ac883d3c3d3024d368fab63d26e02637a@lausen.nl/T/.
On 6.4.0-rc4+, the error is followed by further drm errors:
[ 251.909629] [drm:drm_mode_config_helper_resume] *ERROR* Failed to resume (-107)[ 252.015151] [drm] Unexpected interrupt: 0x01000000[...][ 252.564394] [drm:_dpu_rm_check_lm_and_get_connected_blks] [dpu error]failed to get dspp on lm 0[ 252.573353] [drm:_dpu_rm_make_reservation] [dpu error]unable to find appropriate mixers[ 252.576786] usb 1-1.1: new high-speed USB device number 16 using xhci-hcd[ 252.581577] [drm:dpu_rm_reserve] [dpu error]failed to reserve hw resources: -119
I checkout latest 6.3.6 kernel.
Linux version 6.3.6-00002-g58930848b708-dirty (khsieh@khsieh-linux1) (Chromium OS 16.0_pre475826_p20230103-r10 clang version 16.0.0 (/var/tmp/portage/sys-devel/llvm-16.0_pre475826_p20230103-r10/work/llvm-16.0_pre475826_p20230103/clang 11897708c0229c92802e747564e7c34b722f045f), LLD 16.0.0) #3 (closed) SMP PREEMPT Mon Jun 12 09:40:45 PDT 2023
from linux shell, I run "powerd_dbus_suspend" to test suspend/resume,
I did not see "[drm:drm_mode_config_helper_resume] ERROR Failed to resume (-107)" error message shows up.
Is it possible that you can try 6.3.6 kernel to verify same issue still exists?
I further verified that the issue can be reproduced in console mode (tty) only. I booted the lazor device with external monitor connected, stopped the display manager, logged in to the device via ChromeOS Debug Cable (SuzyQ), suspended the device and resumed the device. At that point, both device and external screen would remain black. But via debug cable, I could confirm the error is printed.
did you get a chance to reproduce this issue? I verified that 6.4.0 still faces the issue:
[ 461.852668] PM: suspend entry (deep)[ 461.931478] Filesystems sync: 0.075 seconds[...][ 466.153428] [drm:drm_mode_config_helper_resume] *ERROR* Failed to resume (-107)[ 466.258953] [drm] Unexpected interrupt: 0x01000000[ 466.408402] OOM killer enabled.[ 466.411636] Restarting tasks ...[ 466.411855] usb 1-1.4.3: USB disconnect, device number 6[ 466.411871] usb 1-1.4.4.1: USB disconnect, device number 10[ 466.416226] done.[ 466.416466] r8152-cfgselector 2-1.4.4.2: USB disconnect, device number 7[ 466.416657] usb 1-1.4.2.1: USB disconnect, device number 12[ 466.416723] usb 1-1.1: USB disconnect, device number 4[ 466.442554] rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { } 25 jiffies s: 709 root: 0x0/.[ 466.453202] random: crng reseeded on system resumption[ 466.471991] PM: suspend exit[...][ 472.610992] [drm:dp_aux_isr] *ERROR* Unexpected DP AUX IRQ 0x01000000 when not busy[ 476.955576] usb 1-1.4.2.2: reset high-speed USB device number 21 using xhci-hcd[ 486.340510] [drm:dp_aux_isr] *ERROR* Unexpected DP AUX IRQ 0x01000000 when not busy
currently, our dp driver does not incorporate with pm_runitme_xxx() mechanism.
at resume, drm_mode_config_helper_resume() will call atomic_check() to check dp bridge status.
At this time, the DP driver is offline so it return -ENOTCONN (-107).
This produce error message ==> [ 49.589243] [drm:drm_mode_config_helper_resume] ERROR Failed to resume (-107)
It does not cause any harm to system.
I am working on incorporating pm_runtime_xxx() mechanism into DP driver.
This will fix this issue.
I will add you as reviewer when I post the patch series later so that you can verify it.
After more details investigation, I had collected below kernel logs when problem happen during suspend/resume at external DP.
I think the problem is at bridge frame work which has atomic_check() called before hpd_enable(). Therefore DP driver will return -ENOTCONN (-107) since HPD is not enabled.
Once hpd_enabled() called, all following atomc_check() will return true and operation go back to normal.
Since this error does not cause any harm to system operation.
I think we can either close this bug or assign this bug to bridge frame work tp fix it.
[ 365.008959] dp_bridge_atomic_check: kuogee: edp=0 is_connect=0 <== fist atomic_cehck() called, DP hpd is not enable yet so that DP's is_connected is false, hence return -107 (-ENOTCONN).
[ 366.457370] dp_bridge_atomic_check: kuogee: edp=0 is_connect=1 <== second atomic_check() called, hpd is enabled so that is_connect is true
[ 366.458579] dp_bridge_atomic_check: kuogee: edp=0 is_connect=1
Thank you for taking a look @quic_khsieh! Would you mind clarifying how your finding relates to your earlier hypothesis, that pm_runtime_xxx() mechanism in DP driver is missing and thus DP driver being offline?
What isn't clear to me is why DP hpd state is not preserved across the suspend resume cycle. It seems that if it were preserved, DP's is_connected would correctly return true, avoiding the error and potentially speeding up the external monitor bring-up by a second or a few seconds? Maybe there's a good reason not to preserve the state, and if so it would be great if you can point it out. Thank you!
https://patchwork.freedesktop.org/series/120375/
This the pm runtime framework patch series I had mentioned before.
At this patch series, pm runtime handling within DP driver has been re-worked.
After that I can no longer duplicate the problem.
Can you please apply this patch series and confirm this problem had been fixed.
Thank you, @quic_khsieh. I tested the series applied to kernel v6.6 and it fixes the *ERROR* Failed to resume (-107) issue. The other errors (*ERROR* Unexpected DP AUX IRQ 0x01000000 when not busy, failed to get dspp on lm 0, unable to find appropriate mixers and failed to reserve hw resources: -119) remain however. Do you have an idea why? Should I open a separate issue to track them?
Below please see the drm specific kernel messages when booting up, suspending and resuming a v6.6 kernel with your series applied on sc7180-trogdor-lazor:
[ 0.312747] [drm] Initialized vgem 1.0.0 20120112 for vgem on minor 0[ 3.981215] [drm:dpu_kms_hw_init:1108] dpu hardware revision:0x60020000[ 3.990239] [drm] Initialized msm 1.10.0 20130625 for ae01000.display-controller on minor 1[ 4.000465] msm_dpu ae01000.display-controller: [drm:adreno_request_fw] loaded qcom/a630_sqe.fw from new location[ 4.011074] msm_dpu ae01000.display-controller: [drm:adreno_request_fw] loaded qcom/a630_gmu.bin from new location[ 4.184470] msm_dpu ae01000.display-controller: [drm] fb0: msmdrmfb frame buffer device[ 39.024238] [drm:dp_aux_isr] *ERROR* Unexpected DP AUX IRQ 0x01000000 when not busy[ 51.383232] [drm:_dpu_rm_check_lm_and_get_connected_blks] [dpu error]failed to get dspp on lm 0[ 51.392222] [drm:_dpu_rm_make_reservation] [dpu error]unable to find appropriate mixers[ 51.400461] [drm:dpu_rm_reserve] [dpu error]failed to reserve hw resources: -119[...][ 89.758997] PM: suspend entry (deep)[...][ 94.367396] [drm:dp_aux_isr] *ERROR* Unexpected DP AUX IRQ 0x01000000 when not busy[...][ 94.735347] PM: suspend exit
Regarding, "failed to get dspp on lm 0, unable to find appropriate mixers and failed to reserve hw resources: -119:", I have seen this too and this has a different root-cause. We can track this through another bug and close this one.
Thank you, @jesszhan. I've opened #57 (closed) to track remaining issues. In fact, at least on 6.9.2 we got the *ERROR* Failed to resume back (though with different error code).