Bisected kernel deadlock with 5.12-rc[1..8] (list corruption)
Hello
I'm geting kernel deadlock in a short while after Xserver is started - usually mouse moves a bit longer before it freezes as well, then it is rock-solid dead - even Sysrq+B no longer works and I have to switch laptop off with 5s button press.
Laptop is oldish T61 4G, C2D 2.2Ghz - running uptodate Fedora Rawhide: I'm usually disabling mitigation - but it doesn't matter if they are on/off.
I've 2 stack traces capture by ssh from other machine before it deadlocks suggesting there is a list corruption somewhere.
5.11 works for me OK, 5.12-rc1 has been first tried crashing kernel.
This one is from Rawhide Debug kernel:
[ 116.227492] list_del corruption. next->prev should be ffff97b81185e8b0, but was ffff97b935a553b0
[ 116.227552] ------------[ cut here ]------------
[ 116.227555] kernel BUG at lib/list_debug.c:54!
[ 116.227563] invalid opcode: 0000 [#1] SMP NOPTI
[ 116.227568] CPU: 1 PID: 123 Comm: kworker/u4:4 Not tainted 5.12.0-0.rc4.20210324git7acac4b3196c.176.fc35.x86_64 #1
[ 116.227573] Hardware name: LENOVO 6464CTO/6464CTO, BIOS 7LETC9WW (2.29 ) 03/18/2011
[ 116.227577] Workqueue: i915 __i915_gem_free_work [i915]
[ 116.227724] RIP: 0010:__list_del_entry_valid.cold+0x1d/0x47
[ 116.227732] Code: c7 c7 60 b4 64 94 e8 87 fc fd ff 0f 0b 48 89 fe 48 c7 c7 f0 b4 64 94 e8 76 fc fd ff 0f 0b 48 c7 c7 a0 b5 64 94 e8 68 fc fd ff <0f> 0b 48 89 f2 48 89 fe 48 c7 c7 60 b5 64 94 e8 54 fc fd ff 0f 0b
[ 116.227737] RSP: 0018:ffffba1600227d80 EFLAGS: 00010082
[ 116.227741] RAX: 0000000000000054 RBX: ffff97b81185e200 RCX: 0000000000000000
[ 116.227745] RDX: ffff97b9379e97a0 RSI: ffff97b9379daae0 RDI: ffff97b9379daae0
[ 116.227748] RBP: ffff97b90cec0000 R08: 0000000000000000 R09: ffffba1600227bc8
[ 116.227751] R10: ffffba1600227bc0 R11: 0000000000000000 R12: 0000000000000246
[ 116.227754] R13: ffff97b90cecacf8 R14: ffff97b81185e8b0 R15: ffff97b801a9ac00
[ 116.227758] FS: 0000000000000000(0000) GS:ffff97b937800000(0000) knlGS:0000000000000000
[ 116.227762] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 116.227765] CR2: 00007f8154702000 CR3: 0000000114c28000 CR4: 00000000000006e0
[ 116.227769] Call Trace:
[ 116.227773] i915_gem_object_make_unshrinkable+0x75/0xd0 [i915]
[ 116.227881] __i915_gem_object_unset_pages+0x52/0x240 [i915]
[ 116.227989] __i915_gem_object_put_pages+0x44/0xa0 [i915]
[ 116.228096] __i915_gem_free_objects.constprop.0+0x13a/0x300 [i915]
[ 116.228202] process_one_work+0x2b0/0x5e0
[ 116.228211] worker_thread+0x55/0x3c0
[ 116.228215] ? process_one_work+0x5e0/0x5e0
[ 116.228219] kthread+0x13a/0x150
[ 116.228223] ? __kthread_bind_mask+0x60/0x60
[ 116.228228] ret_from_fork+0x1f/0x30
[ 116.228237] Modules linked in: rfcomm ccm xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 ip6table_mangle ip6table_nat ip6table_filter ip6_tables iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter bridge stp llc cmac bnep btusb btrtl btbcm btintel bluetooth ecdh_generic ecc coretemp kvm_intel iwl3945 iwlegacy iTCO_wdt intel_pmc_bxt snd_hda_codec_analog kvm iTCO_vendor_support snd_hda_codec_generic mac80211 irqbypass snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec snd_hda_core i2c_i801 snd_hwdep i2c_smbus joydev snd_seq wmi_bmof cfg80211 snd_seq_device snd_pcm r592 memstick lpc_ich e1000e libarc4 thinkpad_acpi snd_timer platform_profile ledtrig_audio snd soundcore rfkill acpi_cpufreq binfmt_misc nfsd fuse auth_rpcgss nfs_acl lockd grace sunrpc nfs_ssc ip_tables i915 i2c_algo_bit drm_kms_helper sdhci_pci cqhci sdhci cec serio_raw mmc_core drm yenta_socket ata_generic pata_acpi video wmi
[ 116.228367] ---[ end trace 08a2ac33f3911336 ]---
[ 116.228370] RIP: 0010:__list_del_entry_valid.cold+0x1d/0x47
[ 116.228375] Code: c7 c7 60 b4 64 94 e8 87 fc fd ff 0f 0b 48 89 fe 48 c7 c7 f0 b4 64 94 e8 76 fc fd ff 0f 0b 48 c7 c7 a0 b5 64 94 e8 68 fc fd ff <0f> 0b 48 89 f2 48 89 fe 48 c7 c7 60 b5 64 94 e8 54 fc fd ff 0f 0b
[ 116.228379] RSP: 0018:ffffba1600227d80 EFLAGS: 00010082
[ 116.228383] RAX: 0000000000000054 RBX: ffff97b81185e200 RCX: 0000000000000000
[ 116.228387] RDX: ffff97b9379e97a0 RSI: ffff97b9379daae0 RDI: ffff97b9379daae0
[ 116.228390] RBP: ffff97b90cec0000 R08: 0000000000000000 R09: ffffba1600227bc8
[ 116.228393] R10: ffffba1600227bc0 R11: 0000000000000000 R12: 0000000000000246
[ 116.228396] R13: ffff97b90cecacf8 R14: ffff97b81185e8b0 R15: ffff97b801a9ac00
[ 116.228399] FS: 0000000000000000(0000) GS:ffff97b937800000(0000) knlGS:0000000000000000
[ 116.228403] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 116.228406] CR2: 00007f8154702000 CR3: 0000000114c28000 CR4: 00000000000006e0
[ 116.228410] note: kworker/u4:4[123] exited with preempt_count 1
[ 116.228414] BUG: sleeping function called from invalid context at include/linux/percpu-rwsem.h:49
[ 116.228417] in_atomic(): 0, irqs_disabled(): 1, non_block: 0, pid: 123, name: kworker/u4:4
[ 116.228421] INFO: lockdep is turned off.
[ 116.228423] irq event stamp: 53468
[ 116.228426] hardirqs last enabled at (53467): [<ffffffff93d64204>] _raw_spin_unlock_irq+0x24/0x40
[ 116.228433] hardirqs last disabled at (53468): [<ffffffff93d5dd4a>] __schedule+0x6fa/0xb40
[ 116.228438] softirqs last enabled at (53348): [<ffffffffc0c65c5c>] ieee80211_sta_rx_queued_mgmt+0x46c/0x790 [mac80211]
[ 116.228462] softirqs last disabled at (53346): [<ffffffffc0a6b8bc>] cfg80211_put_bss+0x2c/0xd0 [cfg80211]
[ 116.228462] CPU: 1 PID: 123 Comm: kworker/u4:4 Tainted: G D --------- --- 5.12.0-0.rc4.20210324git7acac4b3196c.176.fc35.x86_64 #1
[ 116.228462] Hardware name: LENOVO 6464CTO/6464CTO, BIOS 7LETC9WW (2.29 ) 03/18/2011
[ 116.228462] Workqueue: i915 __i915_gem_free_work [i915]
[ 116.228462] Call Trace:
[ 116.228462] dump_stack+0x7f/0xa1
[ 116.228462] ___might_sleep.cold+0xb6/0xc6
[ 116.228462] exit_signals+0x1c/0x2d0
[ 116.228462] do_exit+0xbf/0xc20
[ 116.228462] ? kthread+0x13a/0x150
[ 116.228462] rewind_stack_do_exit+0x17/0x20
[ 116.228462] RIP: 0000:0x0
[ 116.228462] Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6.
[ 116.228462] RSP: 0000:0000000000000000 EFLAGS: 00000000 ORIG_RAX: 0000000000000000
[ 116.228462] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[ 116.228462] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[ 116.228462] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[ 116.228462] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[ 116.228462] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
This one is from Rawhide non-debug kernel:
[ 178.610195] list_del corruption. next->prev should be ffff9823ad542d20, but was ffff9823ad5433a0
[ 178.610260] ------------[ cut here ]------------
[ 178.610266] kernel BUG at lib/list_debug.c:54!
[ 178.610281] invalid opcode: 0000 [#1] SMP NOPTI
[ 178.610292] CPU: 0 PID: 1606 Comm: Xorg Tainted: G U --------- --- 5.12.0-0.rc4.175.fc35.x86_64 #1
[ 178.610304] Hardware name: LENOVO 6464CTO/6464CTO, BIOS 7LETC9WW (2.29 ) 03/18/2011
[ 178.610311] RIP: 0010:__list_del_entry_valid.cold+0x1d/0x47
[ 178.610329] Code: c7 c7 d8 29 61 84 e8 f4 2e fe ff 0f 0b 48 89 fe 48 c7 c7 68 2a 61 84 e8 e3 2e fe ff 0f 0b 48 c7 c7 18 2b 61 84 e8 d5 2e fe ff <0f> 0b 48 89 f2 48 89 fe 48 c7 c7 d8 2a 61 84 e8 c1 2e fe ff 0f 0b
[ 178.610339] RSP: 0018:ffffa4f800a3bd88 EFLAGS: 00010082
[ 178.610350] RAX: 0000000000000054 RBX: ffffa4f800a3be58 RCX: 0000000000000000
[ 178.610358] RDX: ffff9823b7826720 RSI: ffff9823b78185c0 RDI: ffff9823b78185c0
[ 178.610366] RBP: ffff9823ad542a40 R08: 0000000000000000 R09: ffffa4f800a3bbc0
[ 178.610373] R10: ffffa4f800a3bbb8 R11: ffff9823bbefcfe8 R12: ffff982389305b28
[ 178.610380] R13: 0000000000000000 R14: ffff9823ad542c30 R15: ffff982389305b20
[ 178.610389] FS: 00007f837ababa80(0000) GS:ffff9823b7800000(0000) knlGS:0000000000000000
[ 178.610398] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 178.610406] CR2: 00007f836fac99f0 CR3: 00000001234ee000 CR4: 00000000000006f0
[ 178.610415] Call Trace:
[ 178.610424] i915_gem_madvise_ioctl+0x1fc/0x2b0 [i915]
[ 178.610744] ? i915_gem_pwrite_ioctl+0x480/0x480 [i915]
[ 178.610997] drm_ioctl_kernel+0x8e/0xe0 [drm]
[ 178.610997] drm_ioctl+0x21e/0x3b0 [drm]
[ 178.610997] ? i915_gem_pwrite_ioctl+0x480/0x480 [i915]
[ 178.610997] __x64_sys_ioctl+0x82/0xb0
[ 178.610997] do_syscall_64+0x33/0x40
[ 178.610997] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 178.610997] RIP: 0033:0x7f837b3f52ab
[ 178.610997] Code: ff ff ff 85 c0 79 9b 49 c7 c4 ff ff ff ff 5b 5d 4c 89 e0 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 95 bb 0c 00 f7 d8 64 89 01 48
[ 178.610997] RSP: 002b:00007ffe12d62cb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[ 178.610997] RAX: ffffffffffffffda RBX: 00007f8379dcd360 RCX: 00007f837b3f52ab
[ 178.610997] RDX: 00007ffe12d62d00 RSI: 00000000c00c6466 RDI: 000000000000000d
[ 178.610997] RBP: 000000000000000d R08: 000055dee119e9c0 R09: 00000000605c6199
[ 178.610997] R10: 00007ffe12d7d080 R11: 0000000000000246 R12: 000055dee1884ed0
[ 178.610997] R13: 00000000c00c6466 R14: 00007f8379dcd000 R15: 00007ffe12d62d00
[ 178.610997] Modules linked in: rfcomm ccm xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 ip6table_mangle ip6table_nat ip6table_filter ip6_tables iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter bridge stp llc cmac bnep btusb btrtl btbcm btintel bluetooth ecdh_generic ecc coretemp snd_hda_codec_analog snd_hda_codec_generic kvm_intel kvm iTCO_wdt intel_pmc_bxt iTCO_vendor_support iwl3945 iwlegacy snd_hda_intel irqbypass snd_intel_dspcfg snd_intel_sdw_acpi mac80211 snd_hda_codec snd_hda_core pcspkr snd_hwdep snd_seq snd_seq_device snd_pcm cfg80211 joydev i2c_i801 wmi_bmof i2c_smbus thinkpad_acpi e1000e r592 snd_timer lpc_ich memstick platform_profile
ledtrig_audio libarc4 snd soundcore rfkill acpi_cpufreq binfmt_misc nfsd auth_rpcgss fuse nfs_acl lockd grace sunrpc nfs_ssc ip_tables i915 i2c_algo_bit drm_kms_helper cec drm sdhci_pci cqhci sdhci serio_raw mmc_core ata_generic wmi yenta_socket pata_acpi video
[ 178.610997] ---[ end trace 517f2705ec3a0889 ]---
[ 178.610997] RIP: 0010:__list_del_entry_valid.cold+0x1d/0x47
[ 178.610997] Code: c7 c7 d8 29 61 84 e8 f4 2e fe ff 0f 0b 48 89 fe 48 c7 c7 68 2a 61 84 e8 e3 2e fe ff 0f 0b 48 c7 c7 18 2b 61 84 e8 d5 2e fe ff <0f> 0b 48 89 f2 48 89 fe 48 c7 c7 d8 2a 61 84 e8 c1 2e fe ff 0f 0b
[ 178.610997] RSP: 0018:ffffa4f800a3bd88 EFLAGS: 00010082
[ 178.610997] RAX: 0000000000000054 RBX: ffffa4f800a3be58 RCX: 0000000000000000
[ 178.610997] RDX: ffff9823b7826720 RSI: ffff9823b78185c0 RDI: ffff9823b78185c0
[ 178.610997] RBP: ffff9823ad542a40 R08: 0000000000000000 R09: ffffa4f800a3bbc0
[ 178.610997] R10: ffffa4f800a3bbb8 R11: ffff9823bbefcfe8 R12: ffff982389305b28
[ 178.610997] R13: 0000000000000000 R14: ffff9823ad542c30 R15: ffff982389305b20
[ 178.610997] FS: 00007f837ababa80(0000) GS:ffff9823b7800000(0000) knlGS:0000000000000000
[ 178.610997] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 178.610997] CR2: 00007f836fac99f0 CR3: 00000001234ee000 CR4: 00000000000006f0
Regards
Zdenek