dce110: KASAN global-out-of-bounds in read_indirect_azalia_reg in 5.11.0-rc5+
And another memory issue, this time seen with just KASAN:
[ 9.913806] ==================================================================
[ 9.913814] BUG: KASAN: global-out-of-bounds in read_indirect_azalia_reg+0x204/0x2d0 [amdgpu]
[ 9.914178] Read of size 4 at addr ffffffffc18800c8 by task systemd-udevd/731
[ 9.914188] CPU: 5 PID: 731 Comm: systemd-udevd Not tainted 5.11.0-rc5Lyude-Test+ #69
[ 9.914194] Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
[ 9.914200] Call Trace:
[ 9.914204] dump_stack+0x7d/0xa3
[ 9.914213] print_address_description.constprop.0+0x18/0x130
[ 9.914221] ? read_indirect_azalia_reg+0x204/0x2d0 [amdgpu]
[ 9.914566] ? read_indirect_azalia_reg+0x204/0x2d0 [amdgpu]
[ 9.914912] kasan_report.cold+0x7f/0x10e
[ 9.914919] ? read_indirect_azalia_reg+0x204/0x2d0 [amdgpu]
[ 9.915262] read_indirect_azalia_reg+0x204/0x2d0 [amdgpu]
[ 9.915605] dce_aud_endpoint_valid+0xf/0x20 [amdgpu]
[ 9.915947] resource_construct+0x305/0xc10 [amdgpu]
[ 9.916290] ? unpoison_range+0x3a/0x60
[ 9.916295] ? dc_destroy_resource_pool+0xe0/0xe0 [amdgpu]
[ 9.916636] ? unpoison_range+0x3a/0x60
[ 9.916641] ? ____kasan_kmalloc.constprop.0+0x84/0xa0
[ 9.916647] dce120_create_resource_pool+0x13ee/0x1b50 [amdgpu]
[ 9.916992] ? dce120_clock_source_create+0x110/0x110 [amdgpu]
[ 9.917333] dc_create_resource_pool+0x402/0x580 [amdgpu]
[ 9.917674] dc_create+0x636/0x1d80 [amdgpu]
[ 9.918020] ? init_object+0x4e/0x80
[ 9.918025] ? amdgpu_cgs_create_device+0x3e/0xd0 [amdgpu]
[ 9.918346] ? dc_create_state+0xa0/0xa0 [amdgpu]
[ 9.918687] ? ___slab_alloc+0x2bf/0x5b0
[ 9.918693] ? amdgpu_cgs_create_device+0x3e/0xd0 [amdgpu]
[ 9.919015] ? drm_print_bits+0x170/0x170 [drm]
[ 9.919056] ? unpoison_range+0x3a/0x60
[ 9.919061] ? ____kasan_kmalloc.constprop.0+0x84/0xa0
[ 9.919066] ? unpoison_range+0x3a/0x60
[ 9.919071] ? ____kasan_kmalloc.constprop.0+0x84/0xa0
[ 9.919076] amdgpu_dm_init.isra.0+0x473/0x640 [amdgpu]
[ 9.919418] ? amdgpu_device_rreg.part.0+0x81/0x290 [amdgpu]
[ 9.919697] ? dm_resume+0x1400/0x1400 [amdgpu]
[ 9.920037] ? smu9_wait_for_response+0x164/0x220 [amdgpu]
[ 9.920364] ? smu9_send_msg_to_smc_with_parameter+0x169/0x270 [amdgpu]
[ 9.920690] ? mutex_unlock+0x1d/0x40
[ 9.920697] ? smum_send_msg_to_smc_with_parameter+0x199/0x300 [amdgpu]
[ 9.921021] ? vega10_fan_ctrl_start_smc_fan_control+0x111/0x1d0 [amdgpu]
[ 9.921354] ? memcpy+0x39/0x60
[ 9.921359] ? psm_set_states+0x10e/0x190 [amdgpu]
[ 9.921721] dm_hw_init+0xe/0x20 [amdgpu]
[ 9.922059] amdgpu_device_init.cold+0x3404/0x48f5 [amdgpu]
[ 9.922401] ? amdgpu_device_cache_pci_state+0xf0/0xf0 [amdgpu]
[ 9.922689] ? pci_bus_read_config_byte+0x140/0x140
[ 9.922696] ? do_pci_enable_device.part.0+0x1d3/0x230
[ 9.922701] ? pci_find_saved_ext_cap+0x120/0x120
[ 9.922706] ? skcipher_walk_done+0x259/0xd30
[ 9.922712] ? pci_enable_device_flags+0x28b/0x370
[ 9.922717] amdgpu_driver_load_kms+0x166/0x8c0 [amdgpu]
[ 9.922998] amdgpu_pci_probe+0x206/0x320 [amdgpu]
[ 9.923276] ? amdgpu_pmops_runtime_suspend+0x2c0/0x2c0 [amdgpu]
[ 9.923555] local_pci_probe+0xd8/0x170
[ 9.923561] pci_device_probe+0x32e/0x600
[ 9.923565] ? kernfs_create_link+0x160/0x220
[ 9.923571] ? pci_device_remove+0x1d0/0x1d0
[ 9.923577] really_probe+0x231/0xcf0
[ 9.923583] driver_probe_device+0x1fe/0x380
[ 9.923588] ? _raw_write_lock_irqsave+0xb0/0xb0
[ 9.923594] device_driver_attach+0x205/0x270
[ 9.923599] __driver_attach+0xf4/0x260
[ 9.923604] ? device_driver_attach+0x270/0x270
[ 9.923609] bus_for_each_dev+0x111/0x180
[ 9.923614] ? _raw_read_lock_irq+0x30/0x30
[ 9.923619] ? subsys_dev_iter_exit+0x10/0x10
[ 9.923624] ? klist_node_init+0x61/0x120
[ 9.923629] ? klist_add_tail+0x5c/0x160
[ 9.923634] bus_add_driver+0x34f/0x580
[ 9.923639] driver_register+0x1ee/0x380
[ 9.923644] ? 0xffffffffc1d45000
[ 9.923648] do_one_initcall+0x89/0x2a0
[ 9.923654] ? perf_trace_initcall_level+0x3b0/0x3b0
[ 9.923659] ? unpoison_range+0x3a/0x60
[ 9.923663] ? ____kasan_kmalloc.constprop.0+0x84/0xa0
[ 9.923669] ? unpoison_range+0x3a/0x60
[ 9.923673] ? unpoison_range+0x3a/0x60
[ 9.923678] do_init_module+0x1ce/0x7c0
[ 9.923685] load_module+0x8fa0/0x95d0
[ 9.923692] ? module_frob_arch_sections+0x20/0x20
[ 9.923697] ? ima_post_read_file+0x15e/0x190
[ 9.923703] ? ima_read_file+0x140/0x140
[ 9.923708] ? kernel_read_file_from_fd+0x4b/0x90
[ 9.923715] __do_sys_finit_module+0xff/0x180
[ 9.923719] ? __ia32_sys_init_module+0xa0/0xa0
[ 9.923725] ? syscall_trace_enter.constprop.0+0x142/0x1c0
[ 9.923732] do_syscall_64+0x33/0x40
[ 9.923738] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 9.923744] RIP: 0033:0x7f42ca88f30d
[ 9.923750] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 3b 7b 0c 00 f7 d8 64 89 01 48
[ 9.923758] RSP: 002b:00007ffca53bad08 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
[ 9.923766] RAX: ffffffffffffffda RBX: 000055ee6a184250 RCX: 00007f42ca88f30d
[ 9.923771] RDX: 0000000000000000 RSI: 000055ee6a17a9b0 RDI: 0000000000000019
[ 9.923776] RBP: 0000000000020000 R08: 0000000000000000 R09: 000055ee6a17aa70
[ 9.923780] R10: 0000000000000019 R11: 0000000000000246 R12: 000055ee6a17a9b0
[ 9.923785] R13: 000055ee69f58a00 R14: 0000000000000000 R15: 000055ee69f5c450
[ 9.923793] The buggy address belongs to the variable:
[ 9.923796] audio_regs+0x108/0xffffffffffe3e040 [amdgpu]
[ 9.924117] Memory state around the buggy address:
[ 9.924121] ffffffffc187ff80: 00 00 04 f9 f9 f9 f9 f9 00 00 00 00 00 00 00 00
[ 9.924127] ffffffffc1880000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 9.924132] >ffffffffc1880080: 00 00 00 00 00 00 00 00 00 f9 f9 f9 f9 f9 f9 f9
[ 9.924136] ^
[ 9.924140] ffffffffc1880100: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 9.924145] ffffffffc1880180: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 f9 f9
[ 9.924149] ==================================================================
[ 9.924154] Disabling lock debugging due to kernel taint
Note that I do have sound disabled in this kernel configuration (amdgpu.audio=0 is also set), however I've already tried re-enabling it and it didn't make much of a difference. That being said, I didn't check to see if I actually got audio working when I re-enabled things so it's not entirely possible that is related to this. I'm able to reproduce this on drm-tip during boot up with my Vega 64: 21:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Vega 10 XL/XT [Radeon RX Vega 56/64] [1002:687f] (rev c3) .
This seems like it might actually be regression, because I didn't run into this on older kernels. I haven't had the time to bisect this though. Note though, that on older kernels I -did- run into a different KASAN error coming from amdkfd which you can find the splat for in #1471 (closed) (see the second backtrace in the issue description).
Dmesg attached: dmesg.log