Kernel panic on 6.1.84 (regression)
On this hardware https://linux-hardware.org/?probe=9c92ac1222 kernel 6.1.83 works OK, but kernel 6.1.84 has an amdgpu-related panic, black screen is seen.
Panic (from https://linux-hardware.org/?probe=9c92ac1222&log=dmesg.1):
[ 2.712734] kfd kfd: amdgpu: Allocated 3969056 bytes on gart
[ 2.712799] amdgpu: sdma_bitmap: 3
[ 2.728485] BUG: unable to handle page fault for address: fffffb12fee00000
[ 2.728500] #PF: supervisor write access in kernel mode
[ 2.728510] #PF: error_code(0x0002) - not-present page
[ 2.728519] PGD 0 P4D 0
[ 2.728530] Oops: 0002 [#1] PREEMPT SMP NOPTI
[ 2.728559] Hardware name: LENOVO 21D1/LNVNB161216, BIOS J6CN48WW 12/28/2023
[ 2.728570] RIP: 0010:vmemmap_populate+0x243/0x332
[ 2.728588] Code: 48 09 d8 a9 ff ff 1f 00 0f 84 a8 00 00 00 4c 89 e0 48 25 00 00 e0 ff 48 89 45 b8 e8 1b ed ff ff b9 14 00 00 00 4c 89 e7 31 c0 <f3> ab 4d 85 f6 74 0b 48 8b 7d b8 b0 fd 4c 89 f1 f3 aa 4d 85 ff 74
[ 2.728609] RSP: 0018:ffffb7018303f6b0 EFLAGS: 00010246
[ 2.728621] RAX: 0000000000000000 RBX: fffffb12fee80000 RCX: 0000000000000014
[ 2.728631] RDX: 0000000000000000 RSI: 80000001242001e3 RDI: fffffb12fee00000
[ 2.728641] RBP: ffffb7018303f700 R08: 0000000000000001 R09: fffff9d305b18008
[ 2.728651] R10: ffff8de6e4400000 R11: 0000000000000000 R12: fffffb12fee00000
[ 2.728661] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000080000
[ 2.728672] FS: 00007ff2fa72afc0(0000) GS:ffff8dedfde40000(0000) knlGS:0000000000000000
[ 2.728684] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2.728693] CR2: fffffb12fee00000 CR3: 0000000107430000 CR4: 0000000000750ee0
[ 2.728704] PKRU: 55555554
[ 2.728711] Call Trace:
[ 2.728718] <TASK>
[ 2.728727] ? __die_body.cold+0x1a/0x1f
[ 2.728744] ? __die+0x2b/0x37
[ 2.728754] ? page_fault_oops+0xaf/0x280
[ 2.728771] ? search_bpf_extables+0x63/0x90
[ 2.728786] ? vmemmap_populate+0x243/0x332
[ 2.728796] ? srso_alias_return_thunk+0x5/0x7f
[ 2.728812] ? search_exception_tables+0x61/0x70
[ 2.728827] ? srso_alias_return_thunk+0x5/0x7f
[ 2.728842] ? kernelmode_fixup_or_oops+0xa2/0x120
[ 2.728857] ? __bad_area_nosemaphore+0x16e/0x1b0
[ 2.728873] ? bad_area_nosemaphore+0x16/0x20
[ 2.728886] ? do_kern_addr_fault+0x7b/0x90
[ 2.728898] ? exc_page_fault+0xe2/0x180
[ 2.728913] ? asm_exc_page_fault+0x27/0x30
[ 2.728933] ? vmemmap_populate+0x243/0x332
[ 2.728949] __populate_section_memmap+0x36/0x51
[ 2.728962] sparse_add_section+0x140/0x1ff
[ 2.728978] __add_pages+0xb6/0x140
[ 2.728990] add_pages+0x17/0x70
[ 2.729002] memremap_pages+0x44c/0x6b0
[ 2.729023] devm_memremap_pages+0x23/0x70
[ 2.729040] svm_migrate_init+0x113/0x1d0 [amdgpu] kgd2kfd_device_init.cold+0x333/0x49e [amdgpu] amdgpu_amdkfd_device_init+0x135/0x1d0 [amdgpu] amdgpu_device_init.cold+0x1571/0x1b80 [amdgpu] ? srso_alias_return_thunk+0x5/0x7f
[ 2.730306] ? srso_alias_return_thunk+0x5/0x7f
[ 2.730309] ? pci_read_config_word+0x27/0x40
[ 2.730316] ? srso_alias_return_thunk+0x5/0x7f
[ 2.730320] amdgpu_driver_load_kms+0x1a/0x110 [amdgpu] amdgpu_pci_probe+0x152/0x3a0 [amdgpu] ? srso_alias_return_thunk+0x5/0x7f
[ 2.730477] local_pci_probe+0x4b/0x90
[ 2.730482] ? pci_match_device+0xe2/0x140
[ 2.730485] pci_device_probe+0xc8/0x250
[ 2.730489] really_probe+0xed/0x3a0
[ 2.730494] ? pm_runtime_barrier+0x55/0x90
[ 2.730499] __driver_probe_device+0x7e/0x140
[ 2.730502] driver_probe_device+0x23/0xa0
[ 2.730506] __driver_attach+0xe4/0x1e0
[ 2.730509] ? __device_attach_driver+0x110/0x110
[ 2.730512] bus_for_each_dev+0x7f/0xd0
[ 2.730516] driver_attach+0x1e/0x30
[ 2.730519] bus_add_driver+0x1b6/0x210
[ 2.730522] ? srso_alias_return_thunk+0x5/0x7f
[ 2.730526] driver_register+0x95/0x100
[ 2.730529] __pci_register_driver+0x68/0x70
[ 2.730533] amdgpu_init+0x6e/0x1000 [amdgpu] ? 0xffffffffc1153000
[ 2.730614] do_one_initcall+0x49/0x210
[ 2.730620] ? srso_alias_return_thunk+0x5/0x7f
[ 2.730623] ? kmalloc_trace+0x2a/0xa0
[ 2.730628] do_init_module+0x52/0x1f0
[ 2.730634] load_module+0x1d98/0x1fa0
[ 2.730638] ? asm_sysvec_apic_timer_interrupt+0x1b/0x20
[ 2.730644] __do_sys_init_module+0x1a3/0x1e0
[ 2.730647] ? srso_alias_return_thunk+0x5/0x7f
[ 2.730651] ? __do_sys_init_module+0x1a3/0x1e0
[ 2.730657] __x64_sys_init_module+0x1a/0x20
[ 2.730660] x64_sys_call+0x89/0x1fd0
[ 2.730663] do_syscall_64+0x35/0x80
[ 2.730667] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 2.730670] RIP: 0033:0x7ff2fb16a3ea
[ 2.730673] Code: 48 8b 0d 89 1a 0d 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 af 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 56 1a 0d 00 f7 d8 64 89 01 48
[ 2.730679] RSP: 002b:00007ffd1e363ac8 EFLAGS: 00000246 ORIG_RAX: 00000000000000af
[ 2.730683] RAX: ffffffffffffffda RBX: 0000560ea1a22840 RCX: 00007ff2fb16a3ea
[ 2.730686] RDX: 00007ff2fb2c7a9d RSI: 0000000000d71edc RDI: 00007ff2f82a1010
[ 2.730689] RBP: 00007ff2f82a1010 R08: 00007ff2f9686000 R09: 0000000000000000
[ 2.730692] R10: 00007ff2fb23ca60 R11: 0000000000000246 R12: 00007ff2fb2c7a9d
[ 2.730695] R13: 0000560ea1a22840 R14: 0000560ea1a229c0 R15: 0000560ea1a22840
[ 2.730700] </TASK>
[ 2.730702] Modules linked in: usbhid amdgpu(+) iommu_v2 gpu_sched drm_buddy i2c_algo_bit drm_display_helper cec rc_core drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops crct10dif_pclmul crc32_pclmul hid_multitouch sdhci_pci polyval_clmulni drm_ttm_helper nvme_tcp polyval_generic hid_generic ttm video cqhci ucsi_acpi nvme_fabrics ghash_clmulni_intel sha512_ssse3 sha256_ssse3 sha1_ssse3 nvme aesni_intel crypto_simd xhci_pci serio_raw typec_ucsi thunderbolt sp5100_tco ccp nvme_core cryptd drm xhci_pci_renesas r8168(O) sdhci typec nvme_common wmi i2c_hid_acpi i2c_hid sunrpc dm_mirror dm_region_hash dm_log ip6_tables ip_tables x_tables autofs4
[ 2.730761] CR2: fffffb12fee00000
[ 2.730764] ---[ end trace 0000000000000000 ]---
Kernel 6.6.27 which also has a backported suspiciuos commit https://github.com/torvalds/linux/commit/6c6064cbe58b43533e3451ad6a8ba9736c109ac3 (appeared in 6.1.84) probably works OK, but I will confirm.
Another hardware with panic: https://linux-hardware.org/?probe=29f83993ee&log=dmesg.1
Edited by mikhailnov