dce110: UBSAN warnings in 5.11.0-rc5+ when starting system
Another UBSAN warning I've been seeing with amdgpu on drm-tip, this time about a shift exponent being too large:
[ 10.086793] ================================================================================
[ 10.086800] UBSAN: shift-out-of-bounds in drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_device_queue_manager.c:1140:32
[ 10.086808] shift exponent 64 is too large for 64-bit type 'long long unsigned int'
[ 10.086815] CPU: 6 PID: 731 Comm: systemd-udevd Tainted: G B 5.11.0-rc5Lyude-Test+ #69
[ 10.086822] Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
[ 10.086828] Call Trace:
[ 10.086832] dump_stack+0x7d/0xa3
[ 10.086841] ubsan_epilogue+0x5/0x40
[ 10.086847] __ubsan_handle_shift_out_of_bounds.cold+0x61/0xe9
[ 10.086855] initialize_cpsch.cold+0x54/0x69 [amdgpu]
[ 10.087209] device_queue_manager_init+0x6d4/0xd50 [amdgpu]
[ 10.087533] kgd2kfd_device_init.cold+0x64c/0x1051 [amdgpu]
[ 10.087882] amdgpu_amdkfd_device_init+0x3ff/0x540 [amdgpu]
[ 10.088198] ? amdgpu_amdkfd_device_probe+0x170/0x170 [amdgpu]
[ 10.088530] amdgpu_device_init.cold+0x2ea3/0x48f5 [amdgpu]
[ 10.088928] ? amdgpu_device_cache_pci_state+0xf0/0xf0 [amdgpu]
[ 10.089201] ? pci_bus_read_config_byte+0x140/0x140
[ 10.089208] ? do_pci_enable_device.part.0+0x1d3/0x230
[ 10.089213] ? pci_find_saved_ext_cap+0x120/0x120
[ 10.089218] ? skcipher_walk_done+0x259/0xd30
[ 10.089228] ? pci_enable_device_flags+0x28b/0x370
[ 10.089233] amdgpu_driver_load_kms+0x166/0x8c0 [amdgpu]
[ 10.089514] amdgpu_pci_probe+0x206/0x320 [amdgpu]
[ 10.089808] ? amdgpu_pmops_runtime_suspend+0x2c0/0x2c0 [amdgpu]
[ 10.090090] local_pci_probe+0xd8/0x170
[ 10.090096] pci_device_probe+0x32e/0x600
[ 10.090101] ? kernfs_create_link+0x160/0x220
[ 10.090107] ? pci_device_remove+0x1d0/0x1d0
[ 10.090112] really_probe+0x231/0xcf0
[ 10.090119] driver_probe_device+0x1fe/0x380
[ 10.090124] ? _raw_write_lock_irqsave+0xb0/0xb0
[ 10.090131] device_driver_attach+0x205/0x270
[ 10.090139] __driver_attach+0xf4/0x260
[ 10.090144] ? device_driver_attach+0x270/0x270
[ 10.090149] bus_for_each_dev+0x111/0x180
[ 10.090154] ? _raw_read_lock_irq+0x30/0x30
[ 10.090159] ? subsys_dev_iter_exit+0x10/0x10
[ 10.090164] ? klist_node_init+0x61/0x120
[ 10.090170] ? klist_add_tail+0x5c/0x160
[ 10.090174] bus_add_driver+0x34f/0x580
[ 10.090179] driver_register+0x1ee/0x380
[ 10.090185] ? 0xffffffffc1d45000
[ 10.090189] do_one_initcall+0x89/0x2a0
[ 10.090195] ? perf_trace_initcall_level+0x3b0/0x3b0
[ 10.090200] ? unpoison_range+0x3a/0x60
[ 10.090205] ? ____kasan_kmalloc.constprop.0+0x84/0xa0
[ 10.090212] ? unpoison_range+0x3a/0x60
[ 10.090216] ? unpoison_range+0x3a/0x60
[ 10.090220] do_init_module+0x1ce/0x7c0
[ 10.090226] load_module+0x8fa0/0x95d0
[ 10.090234] ? module_frob_arch_sections+0x20/0x20
[ 10.090239] ? ima_post_read_file+0x15e/0x190
[ 10.090245] ? ima_read_file+0x140/0x140
[ 10.090250] ? kernel_read_file_from_fd+0x4b/0x90
[ 10.090256] __do_sys_finit_module+0xff/0x180
[ 10.090261] ? __ia32_sys_init_module+0xa0/0xa0
[ 10.090267] ? syscall_trace_enter.constprop.0+0x142/0x1c0
[ 10.090274] do_syscall_64+0x33/0x40
[ 10.090280] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 10.090286] RIP: 0033:0x7f42ca88f30d
[ 10.090291] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 3b 7b 0c 00 f7 d8 64 89 01 48
[ 10.090299] RSP: 002b:00007ffca53bad08 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
[ 10.090307] RAX: ffffffffffffffda RBX: 000055ee6a184250 RCX: 00007f42ca88f30d
[ 10.090312] RDX: 0000000000000000 RSI: 000055ee6a17a9b0 RDI: 0000000000000019
[ 10.090316] RBP: 0000000000020000 R08: 0000000000000000 R09: 000055ee6a17aa70
[ 10.090321] R10: 0000000000000019 R11: 0000000000000246 R12: 000055ee6a17a9b0
[ 10.090325] R13: 000055ee69f58a00 R14: 0000000000000000 R15: 000055ee69f5c450
[ 10.090356] ================================================================================
I've been able to reproduce this by just booting up a machine with both KASAN and UBSAN enabled, with KASAN using inline instrumentation + CONFIG_KASAN_VMALLOC. This was seen on my Vega 64: 21:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Vega 10 XL/XT [Radeon RX Vega 56/64] [1002:687f] (rev c3)
There isn't much else to say about this, other then it doesn't seem to be a recent regression (I can see this going all the way back to 5.6). dmesg attached dmesg.log