drm_kms_helper_poll_disable is called without drm_kms_helper_poll_init having been called for gfx9 and gfx10
In linux-next since commit a4e771729a51168bc36317effaa9962e336d4f5e
commit a4e771729a51168bc36317effaa9962e336d4f5e (HEAD)
Author: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Date: Tue Jan 24 12:45:48 2023 +0200
drm/probe_helper: sort out poll_running vs poll_enabled
There are two flags attemting to guard connector polling:
poll_enabled and poll_running. While poll_enabled semantics is clearly
defined and fully adhered (mark that drm_kms_helper_poll_init() was
called and not finalized by the _fini() call), the poll_running flag
doesn't have such clearliness.
This flag is used only in drm_helper_probe_single_connector_modes() to
guard calling of drm_kms_helper_poll_enable, it doesn't guard the
drm_kms_helper_poll_fini(), etc. Change it to only be set if the polling
is actually running. Tie HPD enablement to this flag.
[...]
the following warning appears for gfx9 and gfx10
[ 90.655957] ------------[ cut here ]------------
[ 90.655957] WARNING: CPU: 5 PID: 41 at kernel/workqueue.c:3066 __flush_work.isra.0+0x259/0x270
[ 90.655963] Modules linked in: ccm rfcomm snd_seq_dummy snd_hrtimer snd_seq snd_seq_device cmac bnep cpufreq_conservative cpufreq_powersave cpufreq_userspace nls_ascii nls_cp437 vfat fat snd_ctl_led btusb btrtl btbcm btintel btmtk bluetooth snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio jitterentropy_rng snd_hda_codec_hdmi sha512_generic snd_hda_intel snd_intel_dspcfg snd_acp3x_pdm_dma snd_soc_dmic snd_acp3x_rn snd_hda_codec ctr snd_soc_core snd_hwdep uvcvideo snd_hda_core videobuf2_vmalloc snd_pcm_oss snd_acp_pci videobuf2_memops snd_mixer_oss videobuf2_v4l2 drbg joydev snd_rn_pci_acp3x snd_pcm videodev snd_acp_config snd_soc_acpi snd_timer msi_wmi edac_mce_amd ecdh_generic ecc videobuf2_common sparse_keymap rapl snd wmi_bmof snd_pci_acp3x ccp soundcore k10temp button battery ac hid_sensor_als hid_sensor_prox hid_sensor_accel_3d hid_sensor_gyro_3d hid_sensor_magn_3d hid_sensor_trigger industrialio_triggered_buffer kfifo_buf industrialio amd_pmc
[ 90.656007] hid_sensor_iio_common acpi_cpufreq evdev hid_multitouch serio_raw mt7921e mt7921_common mt76_connac_lib mt76 mac80211 libarc4 cfg80211 rfkill msr fuse efi_pstore configfs efivarfs autofs4 ext4 crc32c_generic crc16 mbcache jbd2 usbhid amdgpu nvme drm_ttm_helper ttm i2c_hid_acpi nvme_core i2c_hid gpu_sched hid_sensor_hub i2c_algo_bit t10_pi drm_buddy mfd_core hid_generic xhci_pci drm_display_helper xhci_hcd r8169 crc32c_intel drm_kms_helper realtek crc64_rocksoft psmouse mdio_devres usbcore crc64 amd_sfh syscopyarea crc_t10dif sysfillrect hid libphy crct10dif_generic sysimgblt i2c_piix4 usb_common crct10dif_common cec i2c_designware_platform i2c_designware_core
[ 90.656031] CPU: 5 PID: 41 Comm: kworker/5:0 Tainted: G W 6.2.0-rc6-01158-ga4e771729a51-dirty #381
[ 90.656033] Hardware name: Micro-Star International Co., Ltd. Alpha 15 B5EEK/MS-158L, BIOS E158LAMS.107 11/10/2021
[ 90.656034] Workqueue: pm pm_runtime_work
[ 90.656038] RIP: 0010:__flush_work.isra.0+0x259/0x270
[ 90.656040] Code: 8b 04 25 00 1f 02 00 48 89 44 24 40 48 8b 73 30 8b 4b 28 e9 e5 fe ff ff 40 30 f6 4c 8b 36 e9 23 fe ff ff 0f 0b e9 3a ff ff ff <0f> 0b e9 33 ff ff ff e8 9b 20 74 00 66 66 2e 0f 1f 84 00 00 00 00
[ 90.656041] RSP: 0018:ffffa7c7402cfc48 EFLAGS: 00010246
[ 90.656043] RAX: 0000000000000000 RBX: ffff9691d96a0340 RCX: 0000000000000000
[ 90.656043] RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffff9691d96a0340
[ 90.656044] RBP: ffff9691d96a0340 R08: 0000000000000000 R09: ffffa7c7402cfbe0
[ 90.656045] R10: 0000000000000003 R11: ffffffffa2098328 R12: 0000000000000001
[ 90.656045] R13: 0000000000000001 R14: 0000000000000000 R15: ffff9691c16a1248
[ 90.656046] FS: 0000000000000000(0000) GS:ffff96949e740000(0000) knlGS:0000000000000000
[ 90.656047] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 90.656048] CR2: 00007efc40619000 CR3: 00000001c5010000 CR4: 0000000000750ee0
[ 90.656048] PKRU: 55555554
[ 90.656049] Call Trace:
[ 90.656049] <TASK>
[ 90.656050] ? console_unlock+0x4d/0x100
[ 90.656053] ? __irq_work_queue_local+0x27/0x60
[ 90.656056] ? irq_work_queue+0x2b/0x50
[ 90.656057] ? __wake_up_klogd+0x40/0x60
[ 90.656059] __cancel_work_timer+0xed/0x180
[ 90.656061] drm_kms_helper_poll_disable.cold+0x1f/0x2c [drm_kms_helper]
[ 90.656072] amdgpu_device_suspend+0x81/0x170 [amdgpu]
[ 90.656180] amdgpu_pmops_runtime_suspend+0xb5/0x1b0 [amdgpu]
[ 90.656269] pci_pm_runtime_suspend+0x61/0x1b0
[ 90.656271] ? pci_pm_thaw_noirq+0x90/0x90
[ 90.656272] __rpm_callback+0x3f/0x160
[ 90.656274] ? pci_pm_thaw_noirq+0x90/0x90
[ 90.656275] rpm_callback+0x58/0x70
[ 90.656277] ? pci_pm_thaw_noirq+0x90/0x90
[ 90.656278] rpm_suspend+0x10d/0x5f0
[ 90.656280] ? psi_task_switch+0xcd/0x220
[ 90.656282] ? __switch_to_asm+0x3a/0x60
[ 90.656284] ? finish_task_switch.isra.0+0x84/0x280
[ 90.656286] pm_runtime_work+0x8f/0xa0
[ 90.656288] process_one_work+0x1a6/0x2f0
[ 90.656290] worker_thread+0x48/0x380
[ 90.656292] ? rescuer_thread+0x370/0x370
[ 90.656293] kthread+0xd5/0x100
[ 90.656294] ? kthread_complete_and_exit+0x20/0x20
[ 90.656295] ret_from_fork+0x22/0x30
[ 90.656297] </TASK>
[ 90.656297] ---[ end trace 0000000000000000 ]---
It appears because the struct delayed_work used by drm_kms_helper_poll_disable is never initialized. With the following instrumentation
void drm_kms_helper_poll_init(struct drm_device *dev)
{
printk(KERN_INFO "drm_kms_helper_poll_init0\n");
INIT_DELAYED_WORK(&dev->mode_config.output_poll_work, output_poll_execute);
dev->mode_config.poll_enabled = true;
printk(KERN_INFO "dev->mode_config.output_poll_work.work.func = %px\n",
dev->mode_config.output_poll_work.work.func);
drm_kms_helper_poll_enable(dev);
}
[...]
void drm_kms_helper_poll_disable(struct drm_device *dev)
{
if (dev->mode_config.poll_running)
drm_kms_helper_disable_hpd(dev);
printk(KERN_INFO "drm_kms_helper_poll_disable: dev->mode_config.output_poll_work.work.func = %px\n",
dev->mode_config.output_poll_work.work.func);
cancel_delayed_work_sync(&dev->mode_config.output_poll_work);
dev->mode_config.poll_running = false;
}
EXPORT_SYMBOL(drm_kms_helper_poll_disable);
the output of dmesg | grep kms_helper_poll
is
[ 17.433826] drm_kms_helper_poll_disable: dev->mode_config.output_poll_work.work.func = 0000000000000000
[ 17.433931] drm_kms_helper_poll_disable.cold+0x1f/0x2c [drm_kms_helper]
[ 90.655953] drm_kms_helper_poll_disable: dev->mode_config.output_poll_work.work.func = 0000000000000000
[ 90.656061] drm_kms_helper_poll_disable.cold+0x1f/0x2c [drm_kms_helper]
[ 1960.153401] drm_kms_helper_poll_disable: dev->mode_config.output_poll_work.work.func = 0000000000000000
[ 1960.153479] drm_kms_helper_poll_disable.cold+0x1f/0x2c [drm_kms_helper]
showing that the corresponding init was never called.