Regression: AMDGPU crashes after / while suspend - bisected commit: a4e771729a51168bc36317effaa9962e336d4f5e
Brief summary of the problem:
AMDGPU crashes when I send my Notebook (HP 845G9 - Rembrandt-APU) into Suspend to Ram.
[ 1890.790243] WARNING: CPU: 0 PID: 6932 at drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c:599 amdgpu_irq_put+0x46/0x70 [amdgpu]
[ 1890.790704] Modules linked in: uinput michael_mic rfcomm snd_seq_dummy snd_hrtimer nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat qrtr_mhi nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set snd_ctl_led nf_tables nfnetlink bnep uvcvideo btusb btrtl btbcm uvc btintel videobuf2_vmalloc btmtk videobuf2_memops videobuf2_v4l2 videobuf2_common bluetooth videodev mc sunrpc binfmt_misc vfat fat snd_soc_dmic snd_soc_acp6x_mach snd_acp6x_pdm_dma snd_sof_amd_rembrandt snd_sof_amd_renoir qrtr snd_sof_amd_acp snd_sof_pci ath11k_pci snd_sof_xtensa_dsp ath11k snd_sof snd_hda_codec_realtek qmi_helpers snd_sof_utils snd_hda_codec_generic mac80211 snd_soc_core ledtrig_audio snd_hda_codec_hdmi intel_rapl_msr snd_hda_intel intel_rapl_common snd_compress snd_intel_dspcfg ac97_bus snd_intel_sdw_acpi edac_mce_amd snd_pcm_dmaengine snd_hda_codec snd_pci_ps snd_rpl_pci_acp6x kvm_amd snd_hda_core
[ 1890.790754] snd_hda_scodec_cs35l41_spi snd_hwdep snd_pci_acp6x regmap_spi libarc4 snd_seq kvm snd_seq_device cfg80211 irqbypass snd_pcm snd_hda_scodec_cs35l41_i2c rapl hid_sensor_als snd_hda_scodec_cs35l41 hp_wmi hid_sensor_trigger sparse_keymap snd_hda_cs_dsp_ctls cs_dsp snd_pci_acp5x hid_sensor_iio_common pcspkr snd_rn_pci_acp3x wmi_bmof thunderbolt snd_soc_cs35l41_lib industrialio_triggered_buffer snd_timer rfkill snd_acp_config kfifo_buf snd_soc_acpi snd amd_pmf mhi snd_pci_acp3x iosm k10temp i2c_piix4 soundcore platform_profile industrialio wireless_hotkey joydev serial_multi_instantiate acpi_tad amd_pmc loop zram dm_crypt amdgpu i2c_algo_bit drm_ttm_helper ttm nvme iommu_v2 drm_buddy gpu_sched crct10dif_pclmul crc32_pclmul nvme_core crc32c_intel drm_display_helper polyval_clmulni polyval_generic video hid_multitouch ucsi_acpi ghash_clmulni_intel hid_sensor_hub sha512_ssse3 typec_ucsi ccp sp5100_tco cec amd_sfh typec nvme_common wmi i2c_hid_acpi i2c_hid serio_raw ip6_tables ip_tables fuse
[ 1890.790808] CPU: 0 PID: 6932 Comm: kworker/u32:35 Tainted: G W ------- --- 6.3.0-63.fc39.x86_64 #1
[ 1890.790812] Hardware name: HP HP EliteBook 845 14 inch G9 Notebook PC/8990, BIOS U82 Ver. 01.05.01 03/22/2023
[ 1890.790814] Workqueue: events_unbound async_run_entry_fn
[ 1890.790822] RIP: 0010:amdgpu_irq_put+0x46/0x70 [amdgpu]
[ 1890.791292] Code: c0 74 33 48 8b 4e 10 48 83 39 00 74 29 89 d1 48 8d 04 88 8b 08 85 c9 74 11 f0 ff 08 74 07 31 c0 c3 cc cc cc cc e9 5a fd ff ff <0f> 0b b8 ea ff ff ff c3 cc cc cc cc b8 ea ff ff ff c3 cc cc cc cc
[ 1890.791295] RSP: 0018:ffffb0acc7eb7d50 EFLAGS: 00010246
[ 1890.791298] RAX: ffff9e28412cb790 RBX: ffff9e284e000000 RCX: 0000000000000000
[ 1890.791300] RDX: 0000000000000000 RSI: ffff9e284e002510 RDI: ffff9e284e000000
[ 1890.791301] RBP: ffff9e284e000000 R08: fffff3c44c6d8000 R09: fffff3c44c6d4001
[ 1890.791302] R10: fffff3c44c6d4008 R11: ffffb0acc7eb7c48 R12: 0000000000001050
[ 1890.791304] R13: ffff9e284e0189a0 R14: ffffffff84893dd9 R15: ffff9e284cdc0688
[ 1890.791305] FS: 0000000000000000(0000) GS:ffff9e2f7e800000(0000) knlGS:0000000000000000
[ 1890.791307] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1890.791309] CR2: 00007f39dea95000 CR3: 0000000332022000 CR4: 0000000000750ef0
[ 1890.791311] PKRU: 55555554
[ 1890.791312] Call Trace:
[ 1890.791315] <TASK>
[ 1890.791317] gmc_v10_0_hw_fini+0x53/0x90 [amdgpu]
[ 1890.791736] gmc_v10_0_suspend+0xe/0x20 [amdgpu]
[ 1890.792157] amdgpu_device_ip_suspend_phase2+0x107/0x1a0 [amdgpu]
[ 1890.792540] amdgpu_device_suspend+0x107/0x180 [amdgpu]
[ 1890.792920] pci_pm_suspend+0x7f/0x170
[ 1890.792927] ? __pfx_pci_pm_suspend+0x10/0x10
[ 1890.792931] dpm_run_callback+0x8c/0x1e0
[ 1890.792937] __device_suspend+0x10a/0x560
[ 1890.792941] async_suspend+0x1e/0x70
[ 1890.792944] async_run_entry_fn+0x34/0x130
[ 1890.792949] process_one_work+0x1c7/0x3d0
[ 1890.792954] worker_thread+0x51/0x390
[ 1890.792956] ? __pfx_worker_thread+0x10/0x10
[ 1890.792958] kthread+0xed/0x120
[ 1890.792962] ? __pfx_kthread+0x10/0x10
[ 1890.792966] ret_from_fork+0x2c/0x50
[ 1890.792974] </TASK>
[ 1890.792975] ---[ end trace 0000000000000000 ]---
Hardware description:
- CPU: < AMD Ryzen 7 PRO 6850U >
- GPU: < Rembrandt [Radeon 680M] [1002:1681] >
- System Memory: <32GB DDR5 4800MHz>
- Display(s): <14" HP-Notebook-Display>
- Type of Display Connection:
System information:
- Distro name and Version: <Fedora 38>
- Kernel version: <Linux HP845G9 6.2.0-rc6+ >
- Custom kernel: < self built for bisection - Error is also in 6.3.0 >
- AMD official driver version: <mesa-23.0.3>
How to reproduce the issue:
- Send your notebook into Suspend
- wake it up again
- check with dmesg for the error
Result of Bisection
- I bisected the Issue overnight and found out, that this commit: a4e771729a51168bc36317effaa9962e336d4f5e causes the problem
a4e771729a51168bc36317effaa9962e336d4f5e is the first bad commit
commit a4e771729a51168bc36317effaa9962e336d4f5e
Author: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Date: Tue Jan 24 12:45:48 2023 +0200
drm/probe_helper: sort out poll_running vs poll_enabled
There are two flags attemting to guard connector polling:
poll_enabled and poll_running. While poll_enabled semantics is clearly
defined and fully adhered (mark that drm_kms_helper_poll_init() was
called and not finalized by the _fini() call), the poll_running flag
doesn't have such clearliness.
This flag is used only in drm_helper_probe_single_connector_modes() to
guard calling of drm_kms_helper_poll_enable, it doesn't guard the
drm_kms_helper_poll_fini(), etc. Change it to only be set if the polling
is actually running. Tie HPD enablement to this flag.
This fixes the following warning reported after merging the HPD series:
Hot plug detection already enabled
WARNING: CPU: 2 PID: 9 at drivers/gpu/drm/drm_bridge.c:1257 drm_bridge_hpd_enable+0x94/0x9c [drm]
Modules linked in: videobuf2_memops snd_soc_simple_card snd_soc_simple_card_utils fsl_imx8_ddr_perf videobuf2_common snd_soc_imx_spdif adv7511 etnaviv imx8m_ddrc imx_dcss mc cec nwl_dsi gov
CPU: 2 PID: 9 Comm: kworker/u8:0 Not tainted 6.2.0-rc2-15208-g25b283acd578 #6
Hardware name: NXP i.MX8MQ EVK (DT)
Workqueue: events_unbound deferred_probe_work_func
pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : drm_bridge_hpd_enable+0x94/0x9c [drm]
lr : drm_bridge_hpd_enable+0x94/0x9c [drm]
sp : ffff800009ef3740
x29: ffff800009ef3740 x28: ffff000009331f00 x27: 0000000000001000
x26: 0000000000000020 x25: ffff800001148ed8 x24: ffff00000a8fe000
x23: 00000000fffffffd x22: ffff000005086348 x21: ffff800001133ee0
x20: ffff00000550d800 x19: ffff000005086288 x18: 0000000000000006
x17: 0000000000000000 x16: ffff8000096ef008 x15: 97ffff2891004260
x14: 2a1403e194000000 x13: 97ffff2891004260 x12: 2a1403e194000000
x11: 7100385f29400801 x10: 0000000000000aa0 x9 : ffff800008112744
x8 : ffff000000250b00 x7 : 0000000000000003 x6 : 0000000000000011
x5 : 0000000000000000 x4 : ffff0000bd986a48 x3 : 0000000000000001
x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff000000250000
Call trace:
drm_bridge_hpd_enable+0x94/0x9c [drm]
drm_bridge_connector_enable_hpd+0x2c/0x3c [drm_kms_helper]
drm_kms_helper_poll_enable+0x94/0x10c [drm_kms_helper]
drm_helper_probe_single_connector_modes+0x1a8/0x510 [drm_kms_helper]
drm_client_modeset_probe+0x204/0x1190 [drm]
__drm_fb_helper_initial_config_and_unlock+0x5c/0x4a4 [drm_kms_helper]
drm_fb_helper_initial_config+0x54/0x6c [drm_kms_helper]
drm_fbdev_client_hotplug+0xd0/0x140 [drm_kms_helper]
drm_fbdev_generic_setup+0x90/0x154 [drm_kms_helper]
dcss_kms_attach+0x1c8/0x254 [imx_dcss]
dcss_drv_platform_probe+0x90/0xfc [imx_dcss]
platform_probe+0x70/0xcc
really_probe+0xc4/0x2e0
__driver_probe_device+0x80/0xf0
driver_probe_device+0xe0/0x164
__device_attach_driver+0xc0/0x13c
bus_for_each_drv+0x84/0xe0
__device_attach+0xa4/0x1a0
device_initial_probe+0x1c/0x30
bus_probe_device+0xa4/0xb0
deferred_probe_work_func+0x90/0xd0
process_one_work+0x200/0x474
worker_thread+0x74/0x43c
kthread+0xfc/0x110
ret_from_fork+0x10/0x20
---[ end trace 0000000000000000 ]---
Reported-by: Laurentiu Palcu <laurentiu.palcu@oss.nxp.com>
Fixes: c8268795c9a9 ("drm/probe-helper: enable and disable HPD on connectors")
Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
Tested-by: Chen-Yu Tsai <wenst@chromium.org>
Acked-by: Laurentiu Palcu <laurentiu.palcu@oss.nxp.com>
Tested-by: Laurentiu Palcu <laurentiu.palcu@oss.nxp.com>
Tested-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20230124104548.3234554-2-dmitry.baryshkov@linaro.org
(cherry picked from commit d33a54e3991dfce88b4fc6d9c3360951c2c5660d)
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
drivers/gpu/drm/drm_probe_helper.c | 42 +++++++++++++++++++-------------------
1 file changed, 21 insertions(+), 21 deletions(-)
Edited by Martin Wolf