"list_add corruption" and full system lockup from performance monitoring
Running this in one terminal (amd_performance_monitor_measure comes from piglit):
while ./bin/amd_performance_monitor_measure -auto; do date; done
and this in another terminal:
for c in `seq 1 32`; do sleep 1; glxgears & done
causes:
list_add corruption. prev->next should be next (ffff982cc716a710), but was ffff982c87d6ece0. (prev=ffff982ca12d03a0).
------------[ cut here ]------------
kernel BUG at lib/list_debug.c:26!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 0 PID: 3652 Comm: amd_performance Tainted: G E 5.18.7 #82
Hardware name: HP HP Pavilion Laptop 15-cs3xxx/86E2, BIOS F.05 01/01/2020
RIP: 0010:__list_add_valid.cold+0x3d/0x3f
Code: f2 4c 89 c1 48 89 fe 48 c7 c7 88 83 95 82 e8 7f be fe ff 0f 0b 48 89 d1 4c 89 c6 4c 89 ca 48 c7 c7 30 83 95 82 e8 68 be fe ff <0f> 0b 48 89 fe 48 c7 c7 c0 83 95 82 e8 57 be fe ff 0f 0b 48 89 d1
RSP: 0018:ffffb29cc2eafb18 EFLAGS: 00010046
RAX: 0000000000000075 RBX: ffff982c80f508d8 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffffffff82943f68 RDI: 00000000ffffffff
RBP: ffff982cc716a700 R08: 0000000000000000 R09: 00000000ffffefff
R10: ffffb29cc2eaf948 R11: ffffffff82cd6808 R12: ffff982cc7168680
R13: 0000000000000282 R14: ffff982c80f508e0 R15: ffff982ca12d03a0
FS: 00007fc1ea198000(0000) GS:ffff98341fa00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000013ca0f8 CR3: 000000011122c003 CR4: 0000000000770ef0
PKRU: 55555554
Call Trace:
<TASK>
__i915_active_fence_set.part.0+0x7c/0xc0 [i915]
__i915_request_commit+0x152/0x330 [i915]
i915_request_add+0xa3/0x330 [i915]
gen8_modify_context+0x97/0x130 [i915]
oa_configure_all_contexts.isra.0+0x191/0x440 [i915]
lrc_configure_all_contexts.isra.0+0x14a/0x170 [i915]
gen8_enable_metric_set+0x63/0xa0 [i915]
i915_perf_open_ioctl+0x478/0xfa0 [i915]
? i915_oa_init_reg_state+0xe0/0xe0 [i915]
drm_ioctl_kernel+0xb1/0x140 [drm]
drm_ioctl+0x220/0x3c0 [drm]
? i915_oa_init_reg_state+0xe0/0xe0 [i915]
? lock_release+0x13c/0x2e0
__x64_sys_ioctl+0x80/0xb0
do_syscall_64+0x38/0x90
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7fc1ea5f4cc7
Code: 00 00 00 48 8b 05 c9 91 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 99 91 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007fff8e24bd68 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000001146e90 RCX: 00007fc1ea5f4cc7
RDX: 00007fff8e24bda0 RSI: 0000000040106476 RDI: 0000000000000007
RBP: 00000000013c9e90 R08: 00007fc1ea99ab80 R09: 0000000000000403
R10: 0000000000100002 R11: 0000000000000246 R12: 0000000000000007
R13: 00007fff8e24bda0 R14: 000000000000000a R15: 000000000000000f
</TASK>
Modules linked in: ccm(E) snd_hda_codec_hdmi(E) overlay(E) snd_sof_pci_intel_icl(E) snd_sof_intel_hda_common(E) snd_sof_pci(E) snd_sof_xtensa_dsp(E) soundwire_intel(E) soundwire_generic_allocation(E) soundwire_cadence(E) snd_sof_intel_hda(E) snd_sof(E) snd_sof_utils(E) snd_soc_hdac_hda(E) snd_hda_ext_core(E) snd_soc_acpi_intel_match(E) mei_hdcp(E) intel_rapl_msr(E) joydev(E) snd_soc_acpi(E) snd_ctl_led(E) snd_soc_core(E) snd_hda_codec_realtek(E) snd_compress(E) binfmt_misc(E) snd_hda_codec_generic(E) x86_pkg_temp_thermal(E) soundwire_bus(E) intel_powerclamp(E) ledtrig_audio(E) coretemp(E) rtw88_8822ce(E) snd_hda_intel(E) rtw88_8822c(E) snd_intel_dspcfg(E) snd_intel_sdw_acpi(E) rtw88_pci(E) kvm_intel(E) snd_hda_codec(E) rtw88_core(E) snd_hwdep(E) snd_hda_core(E) serio_raw(E) kvm(E) snd_pcsp(E) irqbypass(E) snd_pcm(E) mac80211(E) rapl(E) snd_timer(E) intel_cstate(E) iTCO_wdt(E) hp_wmi(E) processor_thermal_device_pci_legacy(E) intel_pmc_bxt(E) platform_profile(E)
iTCO_vendor_support(E) libarc4(E) processor_thermal_device(E) intel_uncore(E) snd(E) sparse_keymap(E) wmi_bmof(E) intel_wmi_thunderbolt(E) processor_thermal_rfim(E) ee1004(E) watchdog(E) soundcore(E) cfg80211(E) processor_thermal_mbox(E) processor_thermal_rapl(E) intel_rapl_common(E) mei_me(E) mei(E) intel_soc_dts_iosf(E) int3403_thermal(E) int340x_thermal_zone(E) hp_accel(E) lis3lv02d(E) evdev(E) acpi_pad(E) int3400_thermal(E) intel_pmc_core(E) acpi_thermal_rel(E) button(E) acpi_tad(E) ac(E) fuse(E) configfs(E) ip_tables(E) x_tables(E) autofs4(E) ext4(E) crc16(E) mbcache(E) jbd2(E) dm_crypt(E) dm_mod(E) hid_generic(E) usbhid(E) hid(E) nvme(E) crc32_pclmul(E) crc32c_intel(E) i915(E) i2c_algo_bit(E) nvme_core(E) drm_buddy(E) t10_pi(E) drm_dp_helper(E) cec(E) ahci(E) ttm(E) libahci(E) ghash_clmulni_intel(E) xhci_pci(E) crc64_rocksoft_generic(E) crc64_rocksoft(E) xhci_hcd(E) libata(E) drm_kms_helper(E) crc_t10dif(E) r8169(E) syscopyarea(E) crct10dif_generic(E) sysfillrect(E)
crct10dif_pclmul(E) realtek(E) i2c_i801(E) sysimgblt(E) crc64(E) usbcore(E) scsi_mod(E) mdio_devres(E) fb_sys_fops(E) aesni_intel(E) intel_lpss_pci(E) crypto_simd(E) cryptd(E) psmouse(E) i2c_smbus(E) crct10dif_common(E) libphy(E) drm(E) intel_lpss(E) usb_common(E) scsi_common(E) idma64(E) battery(E) wmi(E) video(E)
---[ end trace 0000000000000000 ]---
After that system becomes completely unresponsive. This is also reproducible on 5.10 kernel.
Device: ICL GT2