UBSAN warning with dce110 when cursor changes state
When running a kernel with both KASAN and UBSAN enabled (you need both, otherwise this bug doesn't seem to come up consistently), the kernel will hit a UBSAN _Bool validity check:
[ 279.060335] ================================================================================
[ 279.060352] UBSAN: invalid-load in drivers/gpu/drm/amd/amdgpu/../display/dc/dce110/dce110_hw_sequencer.c:2803:13
[ 279.060364] load of value 242 is not a valid value for type '_Bool'
[ 279.060373] CPU: 3 PID: 1517 Comm: gnome-shell Kdump: loaded Tainted: G B 5.11.0-rc5Lyude-Test+ #69
[ 279.060384] Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
[ 279.060391] Call Trace:
[ 279.060398] dump_stack+0x7d/0xa3
[ 279.060413] ubsan_epilogue+0x5/0x40
[ 279.060424] __ubsan_handle_load_invalid_value.cold+0x43/0x48
[ 279.060436] ? dce_pipe_control_lock+0x885/0x1130 [amdgpu]
[ 279.061152] dce110_set_cursor_position.cold+0x13/0x32 [amdgpu]
[ 279.061876] ? dce110_update_pending_status+0x660/0x660 [amdgpu]
[ 279.062573] ? stack_depot_save+0x207/0x410
[ 279.062585] ? dce_enable_fe_clock+0x250/0x250 [amdgpu]
[ 279.063296] ? drm_mode_cursor_universal+0x3e1/0xb10 [drm]
[ 279.063389] ? drm_mode_cursor_common+0x249/0x900 [drm]
[ 279.063479] ? drm_mode_cursor_ioctl+0x84/0xa0 [drm]
[ 279.063570] dc_stream_set_cursor_position+0x2ce/0x650 [amdgpu]
[ 279.064263] handle_cursor_update+0x76d/0xb60 [amdgpu]
[ 279.064964] ? dm_hw_fini+0x30/0x30 [amdgpu]
[ 279.065663] ? ___slab_alloc+0x2bf/0x5b0
[ 279.065674] ? amdgpu_dm_atomic_commit_tail+0x27e6/0x9d40 [amdgpu]
[ 279.066375] ? unpoison_range+0x3a/0x60
[ 279.066385] amdgpu_dm_commit_cursors+0x160/0x260 [amdgpu]
[ 279.067085] amdgpu_dm_atomic_commit_tail+0x4ef6/0x9d40 [amdgpu]
[ 279.067788] ? __mod_timer+0x5e1/0xb00
[ 279.067802] ? trace_event_raw_event_amdgpu_dm_plane_state_template+0xd10/0xd10 [amdgpu]
[ 279.068503] ? arch_stack_walk+0x4e/0xb0
[ 279.068513] ? deref_stack_reg+0xe6/0x160
[ 279.068524] ? amdgpu_drm_ioctl+0xcd/0x1b0 [amdgpu]
[ 279.069068] ? orc_find.part.0+0x1c0/0x320
[ 279.069078] ? arch_stack_walk+0x4e/0xb0
[ 279.069086] ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 279.069099] ? is_bpf_text_address+0x13/0x20
[ 279.069108] ? kernel_text_address.part.0+0xaf/0xc0
[ 279.069118] ? __kernel_text_address+0x56/0xa0
[ 279.069129] ? _raw_spin_lock_irqsave+0x70/0xb0
[ 279.069138] ? _raw_write_lock_irqsave+0xb0/0xb0
[ 279.069148] ? stack_trace_save+0x81/0xa0
[ 279.069157] ? stack_depot_save+0x207/0x410
[ 279.069166] ? kasan_save_stack+0x32/0x40
[ 279.069175] ? kasan_save_stack+0x1b/0x40
[ 279.069183] ? ____kasan_kmalloc.constprop.0+0x84/0xa0
[ 279.069193] ? drm_atomic_helper_setup_commit+0x49b/0x1460 [drm_kms_helper]
[ 279.069254] ? drm_atomic_helper_commit+0x6b/0x270 [drm_kms_helper]
[ 279.069312] ? drm_atomic_helper_disable_plane+0x11d/0x220 [drm_kms_helper]
[ 279.069370] ? drm_mode_cursor_universal+0x3e1/0xb10 [drm]
[ 279.069461] ? drm_mode_cursor_common+0x249/0x900 [drm]
[ 279.069551] ? drm_mode_cursor_ioctl+0x84/0xa0 [drm]
[ 279.069641] ? drm_ioctl_kernel+0x1a3/0x230 [drm]
[ 279.069723] ? drm_ioctl+0x444/0x920 [drm]
[ 279.069805] ? amdgpu_drm_ioctl+0xce/0x1b0 [amdgpu]
[ 279.070348] ? __x64_sys_ioctl+0x127/0x190
[ 279.070358] ? do_syscall_64+0x33/0x40
[ 279.070368] ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 279.070379] ? deactivate_slab+0x21c/0x540
[ 279.070388] ? drm_print_bits+0x170/0x170 [drm]
[ 279.070479] ? _raw_spin_lock_irq+0x6b/0xb0
[ 279.070488] ? _raw_spin_unlock_irqrestore+0x50/0x50
[ 279.070499] ? _cond_resched+0x17/0x70
[ 279.070507] ? wait_for_completion_timeout+0x75/0x250
[ 279.070517] ? _raw_spin_unlock_irqrestore+0x50/0x50
[ 279.070526] ? wait_for_completion_io+0x230/0x230
[ 279.070536] ? _cond_resched+0x17/0x70
[ 279.070544] ? wait_for_completion_interruptible+0x71/0x3a0
[ 279.070554] ? drm_crtc_commit_wait+0x2e/0x60 [drm]
[ 279.070646] ? drm_atomic_helper_wait_for_dependencies+0x428/0x660 [drm_kms_helper]
[ 279.070707] commit_tail+0x221/0x4a0 [drm_kms_helper]
[ 279.070766] drm_atomic_helper_commit+0x1f9/0x270 [drm_kms_helper]
[ 279.070824] drm_atomic_helper_disable_plane+0x11d/0x220 [drm_kms_helper]
[ 279.070883] drm_mode_cursor_universal+0x3e1/0xb10 [drm]
[ 279.070975] ? __setplane_atomic+0x500/0x500 [drm]
[ 279.071065] ? __mutex_lock_slowpath+0x10/0x10
[ 279.071076] ? drm_modeset_lock+0x144/0x2d0 [drm]
[ 279.071169] drm_mode_cursor_common+0x249/0x900 [drm]
[ 279.071259] ? ____sys_recvmsg+0x1b3/0x5a0
[ 279.071271] ? drm_mode_cursor_universal+0xb10/0xb10 [drm]
[ 279.071362] drm_mode_cursor_ioctl+0x84/0xa0 [drm]
[ 279.071452] ? drm_mode_setplane+0xab0/0xab0 [drm]
[ 279.071541] ? task_tick_fair+0x110/0xd50
[ 279.071551] ? drm_is_current_master+0x73/0x120 [drm]
[ 279.071630] ? drm_ioctl_permit+0x14c/0x180 [drm]
[ 279.071712] ? drm_mode_setplane+0xab0/0xab0 [drm]
[ 279.071802] drm_ioctl_kernel+0x1a3/0x230 [drm]
[ 279.071885] ? drm_setversion+0x7f0/0x7f0 [drm]
[ 279.071968] drm_ioctl+0x444/0x920 [drm]
[ 279.072051] ? drm_mode_setplane+0xab0/0xab0 [drm]
[ 279.072141] ? drm_version+0x390/0x390 [drm]
[ 279.072224] ? rpm_idle+0x610/0x610
[ 279.072234] ? ktime_get+0x59/0xe0
[ 279.072244] ? _raw_spin_lock_irqsave+0x70/0xb0
[ 279.072254] ? _raw_write_lock_irqsave+0xb0/0xb0
[ 279.072265] amdgpu_drm_ioctl+0xce/0x1b0 [amdgpu]
[ 279.072810] __x64_sys_ioctl+0x127/0x190
[ 279.072820] do_syscall_64+0x33/0x40
[ 279.072829] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 279.072840] RIP: 0033:0x7fd18662e38b
[ 279.072849] Code: 89 d8 49 8d 3c 1c 48 f7 d8 49 39 c4 72 b5 e8 1c ff ff ff 85 c0 78 ba 4c 89 e0 5b 5d 41 5c c3 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d bd ba 0c 00 f7 d8 64 89 01 48
[ 279.072859] RSP: 002b:00007ffd14bc9e88 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[ 279.072871] RAX: ffffffffffffffda RBX: 00007ffd14bc9ec0 RCX: 00007fd18662e38b
[ 279.072879] RDX: 00007ffd14bc9ec0 RSI: 00000000c01c64a3 RDI: 0000000000000009
[ 279.072886] RBP: 00000000c01c64a3 R08: 0000000000000000 R09: 0000000000000000
[ 279.072893] R10: 00007fd1866faa00 R11: 0000000000000246 R12: 0000000000000000
[ 279.072899] R13: 0000000000000009 R14: 00007ffd14bc9f70 R15: 000055f293696c60
[ 279.072958] ================================================================================
The reproducer for this is pretty simple:
- Boot the machine with UBSAN and KASAN enabled (for KASAN I'm using inline instrumentation with CONFIG_KASAN_VMALLOC also enabled
- Load up a GUI
- Do something to hide the cursor
I'm seeing this on drm-tip, and the system in question is using a Vega 64 GPU (21:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Vega 10 HDMI Audio [Radeon Vega 56/64] [1002:aaf8]). I don't think this is a regression, but unfortunately already went down the rabbit hole of trying to bisect this as it doesn't seem to appear reliably with only UBSAN enabled.
I mention this last bit regarding an UBSAN-only configuration as interestingly enough, bisecting this with only UBSAN enabled and checking all commits between v5.9 and drm-tip seems to lead us to this being the first commit where it goes away:
# first fixed commit: [f9915b964c25193a6be1aed744c946d6ff177149] Merge tag 'drm-next-2020-10-19' of git://anongit.freedesktop.org/drm/drm
At first this looked like bogus, but it's kind of difficult to tell as enabling KASAN on this commit does actually show a memory overrun that seems relevant to the code that was changed in this merge commit:
Feb 01 18:59:05 LyudeTestTowerGamma kernel: ==================================================================
Feb 01 18:59:05 LyudeTestTowerGamma kernel: BUG: KASAN: slab-out-of-bounds in kfd_create_crat_image_virtual+0x1389/0x14d0 [amdgpu]
Feb 01 18:59:05 LyudeTestTowerGamma kernel: Read of size 1 at addr ffff8881020e1281 by task systemd-udevd/731
Feb 01 18:59:05 LyudeTestTowerGamma kernel:
Feb 01 18:59:05 LyudeTestTowerGamma kernel: CPU: 5 PID: 731 Comm: systemd-udevd Tainted: G W 5.9.0Lyude-Test+ #66
Feb 01 18:59:05 LyudeTestTowerGamma kernel: Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
Feb 01 18:59:05 LyudeTestTowerGamma kernel: Call Trace:
Feb 01 18:59:05 LyudeTestTowerGamma kernel: dump_stack+0x7d/0xa3
Feb 01 18:59:05 LyudeTestTowerGamma kernel: print_address_description.constprop.0+0x1c/0x210
Feb 01 18:59:05 LyudeTestTowerGamma kernel: ? _raw_spin_lock_irqsave+0x70/0xb0
Feb 01 18:59:05 LyudeTestTowerGamma kernel: ? _raw_write_unlock_bh+0x60/0x60
Feb 01 18:59:05 LyudeTestTowerGamma kernel: ? kfd_create_crat_image_virtual+0x1389/0x14d0 [amdgpu]
Feb 01 18:59:05 LyudeTestTowerGamma kernel: ? kfd_create_crat_image_virtual+0x1389/0x14d0 [amdgpu]
Feb 01 18:59:05 LyudeTestTowerGamma kernel: kasan_report.cold+0x37/0x7c
Feb 01 18:59:05 LyudeTestTowerGamma kernel: ? kfd_create_crat_image_virtual+0x1389/0x14d0 [amdgpu]
Feb 01 18:59:05 LyudeTestTowerGamma kernel: kfd_create_crat_image_virtual+0x1389/0x14d0 [amdgpu]
Feb 01 18:59:05 LyudeTestTowerGamma kernel: ? kfd_create_crat_image_acpi+0xe0/0xe0 [amdgpu]
Feb 01 18:59:05 LyudeTestTowerGamma kernel: ? device_create_groups_vargs+0x1cd/0x240
Feb 01 18:59:05 LyudeTestTowerGamma kernel: ? kfd_parse_crat_table+0x2db0/0x2db0 [amdgpu]
Feb 01 18:59:05 LyudeTestTowerGamma kernel: kfd_topology_init+0x2a2/0x3f0 [amdgpu]
Feb 01 18:59:05 LyudeTestTowerGamma kernel: ? kfd_create_topology_device+0x320/0x320 [amdgpu]
Feb 01 18:59:05 LyudeTestTowerGamma kernel: ? __class_register+0x298/0x420
Feb 01 18:59:05 LyudeTestTowerGamma kernel: ? __class_create+0xc5/0x130
Feb 01 18:59:05 LyudeTestTowerGamma kernel: kgd2kfd_init+0x95/0xf0 [amdgpu]
Feb 01 18:59:05 LyudeTestTowerGamma kernel: amdgpu_amdkfd_init+0x7f/0xb0 [amdgpu]
Feb 01 18:59:05 LyudeTestTowerGamma kernel: ? athub_v2_1_get_clockgating+0x110/0x110 [amdgpu]
Feb 01 18:59:05 LyudeTestTowerGamma kernel: ? record_print_text.cold+0x11/0x11
Feb 01 18:59:05 LyudeTestTowerGamma kernel: ? kmem_cache_create_usercopy+0x25a/0x300
Feb 01 18:59:05 LyudeTestTowerGamma kernel: amdgpu_init+0xa0/0x1000 [amdgpu]
Feb 01 18:59:05 LyudeTestTowerGamma kernel: ? 0xffffffffc17fc000
Feb 01 18:59:05 LyudeTestTowerGamma kernel: do_one_initcall+0x89/0x2a0
Feb 01 18:59:05 LyudeTestTowerGamma kernel: ? perf_trace_initcall_level+0x3b0/0x3b0
Feb 01 18:59:05 LyudeTestTowerGamma kernel: ? kasan_unpoison_shadow+0x33/0x40
Feb 01 18:59:05 LyudeTestTowerGamma kernel: ? __kasan_kmalloc.constprop.0+0xc2/0xd0
Feb 01 18:59:05 LyudeTestTowerGamma kernel: ? kasan_unpoison_shadow+0x33/0x40
Feb 01 18:59:05 LyudeTestTowerGamma kernel: ? kasan_unpoison_shadow+0x33/0x40
Feb 01 18:59:05 LyudeTestTowerGamma kernel: do_init_module+0x1ce/0x780
Feb 01 18:59:05 LyudeTestTowerGamma kernel: load_module+0x70d5/0x9860
Feb 01 18:59:05 LyudeTestTowerGamma kernel: ? module_frob_arch_sections+0x20/0x20
Feb 01 18:59:05 LyudeTestTowerGamma kernel: ? ima_post_read_file+0x184/0x210
Feb 01 18:59:05 LyudeTestTowerGamma kernel: ? ima_read_file+0x1b0/0x1b0
Feb 01 18:59:05 LyudeTestTowerGamma kernel: ? __kernel_read+0x1a7/0x4c0
Feb 01 18:59:05 LyudeTestTowerGamma kernel: ? kernel_read_file_from_fd+0x4b/0x90
Feb 01 18:59:05 LyudeTestTowerGamma kernel: __do_sys_finit_module+0xff/0x180
Feb 01 18:59:05 LyudeTestTowerGamma kernel: ? __ia32_sys_init_module+0xa0/0xa0
Feb 01 18:59:05 LyudeTestTowerGamma kernel: ? syscall_trace_enter.constprop.0+0x12e/0x1a0
Feb 01 18:59:05 LyudeTestTowerGamma kernel: do_syscall_64+0x33/0x40
Feb 01 18:59:05 LyudeTestTowerGamma kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9
Feb 01 18:59:05 LyudeTestTowerGamma kernel: RIP: 0033:0x7ff63abc430d
Feb 01 18:59:05 LyudeTestTowerGamma kernel: Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 3b 7b 0c 00 f7 d8 64 89 01 48
Feb 01 18:59:05 LyudeTestTowerGamma kernel: RSP: 002b:00007fffb9fdaa28 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
Feb 01 18:59:05 LyudeTestTowerGamma kernel: RAX: ffffffffffffffda RBX: 0000559eccc2a640 RCX: 00007ff63abc430d
Feb 01 18:59:05 LyudeTestTowerGamma kernel: RDX: 0000000000000000 RSI: 00007ff63ad0035a RDI: 0000000000000018
Feb 01 18:59:05 LyudeTestTowerGamma kernel: RBP: 0000000000020000 R08: 0000000000000000 R09: 0000559eccc1d540
Feb 01 18:59:05 LyudeTestTowerGamma kernel: R10: 0000000000000018 R11: 0000000000000246 R12: 00007ff63ad0035a
Feb 01 18:59:05 LyudeTestTowerGamma kernel: R13: 0000559ecc9eebb0 R14: 0000000000000007 R15: 0000559eccc17c30
Feb 01 18:59:05 LyudeTestTowerGamma kernel:
Feb 01 18:59:05 LyudeTestTowerGamma kernel: Allocated by task 731:
Feb 01 18:59:05 LyudeTestTowerGamma kernel: kasan_save_stack+0x1b/0x40
Feb 01 18:59:05 LyudeTestTowerGamma kernel: __kasan_kmalloc.constprop.0+0xc2/0xd0
Feb 01 18:59:05 LyudeTestTowerGamma kernel: kfd_create_crat_image_virtual+0x13b/0x14d0 [amdgpu]
Feb 01 18:59:05 LyudeTestTowerGamma kernel: kfd_topology_init+0x2a2/0x3f0 [amdgpu]
Feb 01 18:59:05 LyudeTestTowerGamma kernel: kgd2kfd_init+0x95/0xf0 [amdgpu]
Feb 01 18:59:05 LyudeTestTowerGamma kernel: amdgpu_amdkfd_init+0x7f/0xb0 [amdgpu]
Feb 01 18:59:05 LyudeTestTowerGamma kernel: amdgpu_init+0xa0/0x1000 [amdgpu]
Feb 01 18:59:05 LyudeTestTowerGamma kernel: do_one_initcall+0x89/0x2a0
Feb 01 18:59:05 LyudeTestTowerGamma kernel: do_init_module+0x1ce/0x780
Feb 01 18:59:05 LyudeTestTowerGamma kernel: load_module+0x70d5/0x9860
Feb 01 18:59:05 LyudeTestTowerGamma kernel: __do_sys_finit_module+0xff/0x180
Feb 01 18:59:05 LyudeTestTowerGamma kernel: do_syscall_64+0x33/0x40
Feb 01 18:59:05 LyudeTestTowerGamma kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9
Feb 01 18:59:05 LyudeTestTowerGamma kernel:
Feb 01 18:59:05 LyudeTestTowerGamma kernel: The buggy address belongs to the object at ffff8881020e1200
which belongs to the cache kmalloc-128 of size 128
Feb 01 18:59:05 LyudeTestTowerGamma kernel: The buggy address is located 1 bytes to the right of
128-byte region [ffff8881020e1200, ffff8881020e1280)
Feb 01 18:59:05 LyudeTestTowerGamma kernel: The buggy address belongs to the page:
Feb 01 18:59:05 LyudeTestTowerGamma kernel: page:0000000002599104 refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff8881020e0400 pfn:0x1020e0
Feb 01 18:59:05 LyudeTestTowerGamma kernel: head:0000000002599104 order:1 compound_mapcount:0
Feb 01 18:59:05 LyudeTestTowerGamma kernel: flags: 0x17ffffc0010200(slab|head)
Feb 01 18:59:05 LyudeTestTowerGamma kernel: raw: 0017ffffc0010200 ffffea0004306708 ffff888100040210 ffff888100043a40
Feb 01 18:59:05 LyudeTestTowerGamma kernel: raw: ffff8881020e0400 0000000000200011 00000001ffffffff 0000000000000000
Feb 01 18:59:05 LyudeTestTowerGamma kernel: page dumped because: kasan: bad access detected
Feb 01 18:59:05 LyudeTestTowerGamma kernel:
Feb 01 18:59:05 LyudeTestTowerGamma kernel: Memory state around the buggy address:
Feb 01 18:59:05 LyudeTestTowerGamma kernel: ffff8881020e1180: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
Feb 01 18:59:05 LyudeTestTowerGamma kernel: ffff8881020e1200: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Feb 01 18:59:05 LyudeTestTowerGamma kernel: >ffff8881020e1280: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
Feb 01 18:59:05 LyudeTestTowerGamma kernel: ^
Feb 01 18:59:05 LyudeTestTowerGamma kernel: ffff8881020e1300: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 fc
Feb 01 18:59:05 LyudeTestTowerGamma kernel: ffff8881020e1380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
Feb 01 18:59:05 LyudeTestTowerGamma kernel: ==================================================================
Feb 01 18:59:05 LyudeTestTowerGamma kernel: Disabling lock debugging due to kernel taint
However, later on in drm-tip this KASAN error goes away, while the UBSAN issue still seems to persist. Anyway, I've attached a full dmesg from a drm-tip kernel where this happens. Unfortunately, you'll notice that there's also one KASAN error and another UBSAN error in this kernel: