[regression][6.5] KASAN: slab-out-of-bounds in amdgpu_vm_pt_create+0x555/0x670 [amdgpu] on Radeon 7900XTX
On Radeon 7900XTX appeared issue "slab-out-of-bounds in amdgpu_vm_pt_create+0x555/0x670" between commits 3a8a670eeeaa and e55e5df193d2.
Graphics cards with chips 6800M and 6900XT are unaffected.
Backtrace BUG: KASAN: slab-out-of-bounds in amdgpu_vm_pt_create+0x555/0x670
[ 12.562762] ==================================================================
[ 12.562775] BUG: KASAN: slab-out-of-bounds in amdgpu_vm_pt_create+0x555/0x670 [amdgpu]
[ 12.563173] Read of size 4 at addr ffff8881347a8dc8 by task (udev-worker)/660
[ 12.563183] CPU: 0 PID: 660 Comm: (udev-worker) Tainted: G W L ------- --- 6.5.0-0.rc0.20230630gite55e5df193d2.5.fc39.x86_64+debug #1
[ 12.563192] Hardware name: Micro-Star International Co., Ltd. MS-7D73/MPG B650I EDGE WIFI (MS-7D73), BIOS 1.30 05/24/2023
[ 12.563199] Call Trace:
[ 12.563203] <TASK>
[ 12.563206] dump_stack_lvl+0x76/0xd0
[ 12.563213] print_report+0xcf/0x670
[ 12.563220] ? amdgpu_vm_pt_create+0x555/0x670 [amdgpu]
[ 12.563433] kasan_report+0xa6/0xe0
[ 12.563436] ? amdgpu_vm_pt_create+0x555/0x670 [amdgpu]
[ 12.563637] amdgpu_vm_pt_create+0x555/0x670 [amdgpu]
[ 12.563835] ? __pfx_amdgpu_vm_pt_create+0x10/0x10 [amdgpu]
[ 12.564030] ? __module_address+0x95/0x240
[ 12.564035] ? lockdep_init_map_type+0x1a5/0x840
[ 12.564040] ? __raw_spin_lock_init+0x3f/0x110
[ 12.564044] amdgpu_vm_init+0x749/0x10c0 [amdgpu]
[ 12.564240] ? __pfx_amdgpu_vm_init+0x10/0x10 [amdgpu]
[ 12.564441] amdgpu_mes_self_test+0x16e/0x9e0 [amdgpu]
[ 12.564661] ? lock_acquire+0x1a6/0x4f0
[ 12.564664] ? __pfx_amdgpu_mes_self_test+0x10/0x10 [amdgpu]
[ 12.564871] ? local_clock_noinstr+0xd/0xc0
[ 12.564876] ? find_held_lock+0x34/0x120
[ 12.564882] ? _raw_spin_unlock_irqrestore+0x4f/0x80
[ 12.564886] ? amdgpu_irq_update+0x1b2/0x2c0 [amdgpu]
[ 12.565094] mes_v11_0_late_init+0xb8/0xe0 [amdgpu]
[ 12.565304] amdgpu_device_ip_late_init+0x100/0x7b0 [amdgpu]
[ 12.565509] amdgpu_device_init+0x7569/0x8660 [amdgpu]
[ 12.565721] ? __pfx_amdgpu_device_init+0x10/0x10 [amdgpu]
[ 12.565920] ? __pfx_pci_bus_read_config_word+0x10/0x10
[ 12.565925] ? do_pci_enable_device+0x22d/0x2a0
[ 12.565928] ? pci_wait_for_pending+0xa1/0x110
[ 12.565933] amdgpu_driver_load_kms+0x1d/0x4b0 [amdgpu]
[ 12.566131] amdgpu_pci_probe+0x287/0x9e0 [amdgpu]
[ 12.566337] ? __pfx_amdgpu_pci_probe+0x10/0x10 [amdgpu]
[ 12.566536] local_pci_probe+0xda/0x190
[ 12.566540] pci_device_probe+0x23a/0x770
[ 12.566544] ? kernfs_add_one+0x326/0x490
[ 12.566548] ? kernfs_get.part.0+0x4c/0x70
[ 12.566552] ? __pfx_pci_device_probe+0x10/0x10
[ 12.566555] ? kernfs_create_link+0x16b/0x230
[ 12.566559] ? kernfs_put+0x1c/0x40
[ 12.566562] ? sysfs_do_create_link_sd+0x8e/0x100
[ 12.566566] really_probe+0x3df/0xb80
[ 12.566570] __driver_probe_device+0x18c/0x450
[ 12.566573] driver_probe_device+0x4a/0x120
[ 12.566576] __driver_attach+0x1e5/0x4a0
[ 12.566579] ? __pfx___driver_attach+0x10/0x10
[ 12.566582] bus_for_each_dev+0x106/0x190
[ 12.566586] ? __pfx_bus_for_each_dev+0x10/0x10
[ 12.566591] bus_add_driver+0x2a1/0x570
[ 12.566594] ? __pfx_amdgpu_init+0x10/0x10 [amdgpu]
[ 12.566794] ? __pfx_amdgpu_init+0x10/0x10 [amdgpu]
[ 12.566993] driver_register+0x134/0x460
[ 12.566996] ? __pfx_amdgpu_init+0x10/0x10 [amdgpu]
[ 12.567193] do_one_initcall+0xd2/0x430
[ 12.567197] ? __pfx_do_one_initcall+0x10/0x10
[ 12.567202] ? kasan_unpoison+0x44/0x70
[ 12.567206] do_init_module+0x238/0x770
[ 12.567210] load_module+0x5581/0x6f10
[ 12.567216] ? __pfx_load_module+0x10/0x10
[ 12.567220] ? find_held_lock+0x34/0x120
[ 12.567223] ? local_clock_noinstr+0xd/0xc0
[ 12.567227] ? __pfx___might_resched+0x10/0x10
[ 12.567232] ? __do_sys_init_module+0x1f2/0x220
[ 12.567235] __do_sys_init_module+0x1f2/0x220
[ 12.567238] ? __pfx___do_sys_init_module+0x10/0x10
[ 12.567243] do_syscall_64+0x5d/0x90
[ 12.567247] ? asm_exc_page_fault+0x26/0x30
[ 12.567251] ? lockdep_hardirqs_on+0x81/0x110
[ 12.567255] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 12.567258] RIP: 0033:0x7fdb4e92b5de
[ 12.567267] Code: 48 8b 0d 55 08 12 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 af 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 22 08 12 00 f7 d8 64 89 01 48
[ 12.567274] RSP: 002b:00007ffe9ef35008 EFLAGS: 00000246 ORIG_RAX: 00000000000000af
[ 12.567279] RAX: ffffffffffffffda RBX: 000055d8c8acb440 RCX: 00007fdb4e92b5de
[ 12.567282] RDX: 000055d8c8af3840 RSI: 0000000003c829ee RDI: 00007fdb46c16010
[ 12.567285] RBP: 00007ffe9ef350c0 R08: 000055d8c8ad5bd0 R09: ffffffdcab967160
[ 12.567289] R10: 000055dd95219e95 R11: 0000000000000246 R12: 000055d8c8af3840
[ 12.567292] R13: 0000000000020000 R14: 000055d8c8af0d30 R15: 000055d8c8af2740
[ 12.567297] </TASK>
[ 12.567300] Allocated by task 660:
[ 12.567302] kasan_save_stack+0x33/0x60
[ 12.567306] kasan_set_track+0x25/0x30
[ 12.567309] __kasan_kmalloc+0x8f/0xa0
[ 12.567312] amdgpu_mes_self_test+0x157/0x9e0 [amdgpu]
[ 12.567529] mes_v11_0_late_init+0xb8/0xe0 [amdgpu]
[ 12.567738] amdgpu_device_ip_late_init+0x100/0x7b0 [amdgpu]
[ 12.567942] amdgpu_device_init+0x7569/0x8660 [amdgpu]
[ 12.568142] amdgpu_driver_load_kms+0x1d/0x4b0 [amdgpu]
[ 12.568343] amdgpu_pci_probe+0x287/0x9e0 [amdgpu]
[ 12.568543] local_pci_probe+0xda/0x190
[ 12.568546] pci_device_probe+0x23a/0x770
[ 12.568550] really_probe+0x3df/0xb80
[ 12.568552] __driver_probe_device+0x18c/0x450
[ 12.568555] driver_probe_device+0x4a/0x120
[ 12.568557] __driver_attach+0x1e5/0x4a0
[ 12.568560] bus_for_each_dev+0x106/0x190
[ 12.568563] bus_add_driver+0x2a1/0x570
[ 12.568566] driver_register+0x134/0x460
[ 12.568569] do_one_initcall+0xd2/0x430
[ 12.568572] do_init_module+0x238/0x770
[ 12.568574] load_module+0x5581/0x6f10
[ 12.568577] __do_sys_init_module+0x1f2/0x220
[ 12.568580] do_syscall_64+0x5d/0x90
[ 12.568582] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 12.568587] The buggy address belongs to the object at ffff8881347a8000
which belongs to the cache kmalloc-4k of size 4096
[ 12.568593] The buggy address is located 608 bytes to the right of
allocated 2920-byte region [ffff8881347a8000, ffff8881347a8b68)
[ 12.568600] The buggy address belongs to the physical page:
[ 12.568602] page:000000001bdef670 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1347a8
[ 12.568607] head:000000001bdef670 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
[ 12.568611] flags: 0x17ffffc0010200(slab|head|node=0|zone=2|lastcpupid=0x1fffff)
[ 12.568616] page_type: 0xffffffff()
[ 12.568619] raw: 0017ffffc0010200 ffff88810004d040 dead000000000122 0000000000000000
[ 12.568622] raw: 0000000000000000 0000000080040004 00000001ffffffff 0000000000000000
[ 12.568626] page dumped because: kasan: bad access detected
[ 12.568630] Memory state around the buggy address:
[ 12.568632] ffff8881347a8c80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 12.568635] ffff8881347a8d00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 12.568639] >ffff8881347a8d80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 12.568642] ^
[ 12.568644] ffff8881347a8e00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 12.568648] ffff8881347a8e80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 12.568651] ==================================================================
I spended 6 day for bisecting this issue.
But result it turned out not satisfact due to the fact on most commits the video card did not switch to graphics mode, and instead of "slab-out-of-bounds in amdgpu_vm_pt_create+0x555/0x670" I got error "KASAN: null-ptr-deref in range [0x00000000000003f0-0x00000000000003f7]" because of this, all these commits were marked as "skip".
The bisect results can be found in the attached file "bisect-log-slab-out-of-bounds-in-amdgpu _vm_pt_create.txt" all corresponding kernel logs of each bisect step packed in archive "dmesg-slab-out-of-bounds-in-amdgpu_vm_pt_create.zip".
bisect-log-slab-out-of-bounds-in-amdgpu_vm_pt_create.txt
dmesg-slab-out-of-bounds-in-amdgpu_vm_pt_create.zip
dmesg-slab-out-of-bounds-in-amdgpu_vm_pt_create.txt
How else can I help here?