[drm:drm_mode_config_helper_resume] ERROR Failed to resume (-107)

Hi Leonard,

I checkout latest 6.3.6 kernel. Linux version 6.3.6-00002-g58930848b708-dirty (khsieh@khsieh-linux1) (Chromium OS 16.0_pre475826_p20230103-r10 clang version 16.0.0 (/var/tmp/portage/sys-devel/llvm-16.0_pre475826_p20230103-r10/work/llvm-16.0_pre475826_p20230103/clang 11897708c0229c92802e747564e7c34b722f045f), LLD 16.0.0) #3 (closed) SMP PREEMPT Mon Jun 12 09:40:45 PDT 2023

I run linux 6.3.6 plus https://patchwork.freedesktop.org/patch/538937/ at my trogdor device.

from linux shell, I run "powerd_dbus_suspend" to test suspend/resume, I did not see "[drm:drm_mode_config_helper_resume] ERROR Failed to resume (-107)" error message shows up.

Is it possible that you can try 6.3.6 kernel to verify same issue still exists?

Hi Kuogee,

thank you for trying this out. Yes, I just confirmed the issue persists with 6.3.6 + https://patchwork.freedesktop.org/patch/538937/ + https://www.spinics.net/lists/linux-arm-msm/msg148115.html. Can you confirm that you had connected an external display over the USB-C DP port while you suspended and resumed the system? Without having an external display attached, the issue does not occur.

I further verified that the issue can be reproduced in console mode (tty) only. I booted the lazor device with external monitor connected, stopped the display manager, logged in to the device via ChromeOS Debug Cable (SuzyQ), suspended the device and resumed the device. At that point, both device and external screen would remain black. But via debug cable, I could confirm the error is printed.

[   74.518985] [drm:drm_mode_config_helper_resume] *ERROR* Failed to resume (-107)
[   74.624451] [drm] Unexpected interrupt: 0x01000000

I'm also attaching the full log for this simplified reproduction from console tty mode. 2023-06-13-dmesg-6.3.6p.txt

Hi Kuogee @quic_khsieh,

did you get a chance to reproduce this issue? I verified that 6.4.0 still faces the issue:

[  461.852668] PM: suspend entry (deep)
[  461.931478] Filesystems sync: 0.075 seconds
[...]
[  466.153428] [drm:drm_mode_config_helper_resume] *ERROR* Failed to resume (-107)
[  466.258953] [drm] Unexpected interrupt: 0x01000000
[  466.408402] OOM killer enabled.
[  466.411636] Restarting tasks ...
[  466.411855] usb 1-1.4.3: USB disconnect, device number 6
[  466.411871] usb 1-1.4.4.1: USB disconnect, device number 10
[  466.416226] done.
[  466.416466] r8152-cfgselector 2-1.4.4.2: USB disconnect, device number 7
[  466.416657] usb 1-1.4.2.1: USB disconnect, device number 12
[  466.416723] usb 1-1.1: USB disconnect, device number 4
[  466.442554] rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { } 25 jiffies s: 709 root: 0x0/.
[  466.453202] random: crng reseeded on system resumption
[  466.471991] PM: suspend exit
[...]
[  472.610992] [drm:dp_aux_isr] *ERROR* Unexpected DP AUX IRQ 0x01000000 when not busy
[  476.955576] usb 1-1.4.2.2: reset high-speed USB device number 21 using xhci-hcd
[  486.340510] [drm:dp_aux_isr] *ERROR* Unexpected DP AUX IRQ 0x01000000 when not busy

2023-06-26-dmesg-6.4.0.txt

echo mem > /sys/power/state

Is this the command you use to duplicate the problem? If not, could you please give me details commands to duplicate this problem?

@quic_khsieh yes. Please make sure cat /sys/power/mem_sleep shows s2idle [deep]

currently, our dp driver does not incorporate with pm_runitme_xxx() mechanism. at resume, drm_mode_config_helper_resume() will call atomic_check() to check dp bridge status. At this time, the DP driver is offline so it return -ENOTCONN (-107). This produce error message ==> [ 49.589243] [drm:drm_mode_config_helper_resume] ERROR Failed to resume (-107) It does not cause any harm to system.

I am working on incorporating pm_runtime_xxx() mechanism into DP driver. This will fix this issue. I will add you as reviewer when I post the patch series later so that you can verify it.

After more details investigation, I had collected below kernel logs when problem happen during suspend/resume at external DP. I think the problem is at bridge frame work which has atomic_check() called before hpd_enable(). Therefore DP driver will return -ENOTCONN (-107) since HPD is not enabled. Once hpd_enabled() called, all following atomc_check() will return true and operation go back to normal.

Since this error does not cause any harm to system operation. I think we can either close this bug or assign this bug to bridge frame work tp fix it.

[ 365.008959] dp_bridge_atomic_check: kuogee: edp=0 is_connect=0 <== fist atomic_cehck() called, DP hpd is not enable yet so that DP's is_connected is false, hence return -107 (-ENOTCONN).

[ 365.009043] [drm:drm_mode_config_helper_resume] ERROR Failed to resume (-107) <== error message

[ 365.009132] dp_bridge_hpd_enable: kuogee: edp=0 power_cnt=1 <== hpd enable bridge called, dp hpd enabled

[ 365.011569] PM: Finishing wakeup. [ 365.011680] OOM killer enabled. [ 365.011709] Restarting tasks ... [ 365.012283] dp_pm_runtime_idle: kuogee: edp=1 power_cnt=0 [ 365.013582] dp_pm_runtime_suspend: kuogee: edp=1 power_cnt=0 [ 365.017067] done. [ 365.017082] random: crng reseeded on system resumption [ 365.023577] init: cupsd main process ended, respawning [ 365.028674] PM: suspend exit [ 365.329849] dp_pm_runtime_resume: kuogee: edp=1 power_cnt=1

[ 366.457370] dp_bridge_atomic_check: kuogee: edp=0 is_connect=1 <== second atomic_check() called, hpd is enabled so that is_connect is true [ 366.458579] dp_bridge_atomic_check: kuogee: edp=0 is_connect=1

[ 366.459591] dp_bridge_atomic_enable: kuogee: edp=0 power_cnt=2

[ 366.461550] dp_bridge_atomic_enable: kuogee: edp=1 power_cnt=1

[ 366.509896] [drm:dp_aux_isr] ERROR Unexpected DP AUX IRQ 0x01000000 when not busy [ 366.615828] dp_bridge_atomic_post_disable: kuogee: edp=1 power_cnt=0 [ 366.616202] dp_bridge_atomic_enable: kuogee: edp=1 power_cnt=1 [ 366.770235] [drm:dp_aux_isr] ERROR Unexpected DP AUX IRQ 0x01000000 when not busy [ 366.863562] dp_bridge_atomic_check: kuogee: edp=0 is_connect=1

Thank you for taking a look @quic_khsieh! Would you mind clarifying how your finding relates to your earlier hypothesis, that pm_runtime_xxx() mechanism in DP driver is missing and thus DP driver being offline?

What isn't clear to me is why DP hpd state is not preserved across the suspend resume cycle. It seems that if it were preserved, DP's is_connected would correctly return true, avoiding the error and potentially speeding up the external monitor bring-up by a second or a few seconds? Maybe there's a good reason not to preserve the state, and if so it would be great if you can point it out. Thank you!

https://patchwork.freedesktop.org/series/120375/ This the pm runtime framework patch series I had mentioned before. At this patch series, pm runtime handling within DP driver has been re-worked. After that I can no longer duplicate the problem. Can you please apply this patch series and confirm this problem had been fixed.

Thank you, @quic_khsieh. I tested the series applied to kernel v6.6 and it fixes the *ERROR* Failed to resume (-107) issue. The other errors (*ERROR* Unexpected DP AUX IRQ 0x01000000 when not busy, failed to get dspp on lm 0, unable to find appropriate mixers and failed to reserve hw resources: -119) remain however. Do you have an idea why? Should I open a separate issue to track them?

Below please see the drm specific kernel messages when booting up, suspending and resuming a v6.6 kernel with your series applied on sc7180-trogdor-lazor:

[    0.312747] [drm] Initialized vgem 1.0.0 20120112 for vgem on minor 0
[    3.981215] [drm:dpu_kms_hw_init:1108] dpu hardware revision:0x60020000
[    3.990239] [drm] Initialized msm 1.10.0 20130625 for ae01000.display-controller on minor 1
[    4.000465] msm_dpu ae01000.display-controller: [drm:adreno_request_fw] loaded qcom/a630_sqe.fw from new location
[    4.011074] msm_dpu ae01000.display-controller: [drm:adreno_request_fw] loaded qcom/a630_gmu.bin from new location
[    4.184470] msm_dpu ae01000.display-controller: [drm] fb0: msmdrmfb frame buffer device
[   39.024238] [drm:dp_aux_isr] *ERROR* Unexpected DP AUX IRQ 0x01000000 when not busy
[   51.383232] [drm:_dpu_rm_check_lm_and_get_connected_blks] [dpu error]failed to get dspp on lm 0
[   51.392222] [drm:_dpu_rm_make_reservation] [dpu error]unable to find appropriate mixers
[   51.400461] [drm:dpu_rm_reserve] [dpu error]failed to reserve hw resources: -119
[...]
[   89.758997] PM: suspend entry (deep)
[...]
[   94.367396] [drm:dp_aux_isr] *ERROR* Unexpected DP AUX IRQ 0x01000000 when not busy
[...]
[   94.735347] PM: suspend exit

Hi @leezu

Thanks for the update.

Regarding, "ERROR Unexpected DP AUX IRQ 0x01000000 when not busy", this shall be fixed with https://patchwork.freedesktop.org/patch/551847/, which has been merged to msm-fixes 0c1a2e69.

Regarding, "failed to get dspp on lm 0, unable to find appropriate mixers and failed to reserve hw resources: -119:", I have seen this too and this has a different root-cause. We can track this through another bug and close this one.

Thanks for the validation efforts !

Abhinav

hey @leezu ,

Following up with @abhinavk 's comment and closing this bug.

If you're still seeing the hw resources error, please open another bug.

Thanks,

Jessica Zhang

mentioned in issue #57 (closed)

Thank you, @jesszhan. I've opened #57 (closed) to track remaining issues. In fact, at least on 6.9.2 we got the *ERROR* Failed to resume back (though with different error code).

closed

[drm:drm_mode_config_helper_resume] ERROR Failed to resume (-107)

Designs

Child items ...

Activity

Admin message

Admin message

[drm:drm_mode_config_helper_resume] *ERROR* Failed to resume (-107)

Activity

[drm:drm_mode_config_helper_resume] ERROR Failed to resume (-107)