Skip to content

GitLab

  • Projects
  • Groups
  • Snippets
  • Help
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
A
amd
  • Project overview
    • Project overview
    • Details
    • Activity
    • Releases
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 1,058
    • Issues 1,058
    • List
    • Boards
    • Labels
    • Service Desk
    • Milestones
  • Merge Requests 1
    • Merge Requests 1
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Operations
    • Operations
    • Incidents
    • Environments
  • Packages & Registries
    • Packages & Registries
    • Container Registry
  • Analytics
    • Analytics
    • CI/CD
    • Repository
    • Value Stream
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Members
    • Members
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • drm
  • amd
  • Issues
  • #1471

Closed
Open
Created Feb 03, 2021 by Lyude Paul@lyudess

UBSAN warning with dce110 when cursor changes state

When running a kernel with both KASAN and UBSAN enabled (you need both, otherwise this bug doesn't seem to come up consistently), the kernel will hit a UBSAN _Bool validity check:

[  279.060335] ================================================================================
[  279.060352] UBSAN: invalid-load in drivers/gpu/drm/amd/amdgpu/../display/dc/dce110/dce110_hw_sequencer.c:2803:13
[  279.060364] load of value 242 is not a valid value for type '_Bool'
[  279.060373] CPU: 3 PID: 1517 Comm: gnome-shell Kdump: loaded Tainted: G    B             5.11.0-rc5Lyude-Test+ #69
[  279.060384] Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
[  279.060391] Call Trace:
[  279.060398]  dump_stack+0x7d/0xa3
[  279.060413]  ubsan_epilogue+0x5/0x40
[  279.060424]  __ubsan_handle_load_invalid_value.cold+0x43/0x48
[  279.060436]  ? dce_pipe_control_lock+0x885/0x1130 [amdgpu]
[  279.061152]  dce110_set_cursor_position.cold+0x13/0x32 [amdgpu]
[  279.061876]  ? dce110_update_pending_status+0x660/0x660 [amdgpu]
[  279.062573]  ? stack_depot_save+0x207/0x410
[  279.062585]  ? dce_enable_fe_clock+0x250/0x250 [amdgpu]
[  279.063296]  ? drm_mode_cursor_universal+0x3e1/0xb10 [drm]
[  279.063389]  ? drm_mode_cursor_common+0x249/0x900 [drm]
[  279.063479]  ? drm_mode_cursor_ioctl+0x84/0xa0 [drm]
[  279.063570]  dc_stream_set_cursor_position+0x2ce/0x650 [amdgpu]
[  279.064263]  handle_cursor_update+0x76d/0xb60 [amdgpu]
[  279.064964]  ? dm_hw_fini+0x30/0x30 [amdgpu]
[  279.065663]  ? ___slab_alloc+0x2bf/0x5b0
[  279.065674]  ? amdgpu_dm_atomic_commit_tail+0x27e6/0x9d40 [amdgpu]
[  279.066375]  ? unpoison_range+0x3a/0x60
[  279.066385]  amdgpu_dm_commit_cursors+0x160/0x260 [amdgpu]
[  279.067085]  amdgpu_dm_atomic_commit_tail+0x4ef6/0x9d40 [amdgpu]
[  279.067788]  ? __mod_timer+0x5e1/0xb00
[  279.067802]  ? trace_event_raw_event_amdgpu_dm_plane_state_template+0xd10/0xd10 [amdgpu]
[  279.068503]  ? arch_stack_walk+0x4e/0xb0
[  279.068513]  ? deref_stack_reg+0xe6/0x160
[  279.068524]  ? amdgpu_drm_ioctl+0xcd/0x1b0 [amdgpu]
[  279.069068]  ? orc_find.part.0+0x1c0/0x320
[  279.069078]  ? arch_stack_walk+0x4e/0xb0
[  279.069086]  ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  279.069099]  ? is_bpf_text_address+0x13/0x20
[  279.069108]  ? kernel_text_address.part.0+0xaf/0xc0
[  279.069118]  ? __kernel_text_address+0x56/0xa0
[  279.069129]  ? _raw_spin_lock_irqsave+0x70/0xb0
[  279.069138]  ? _raw_write_lock_irqsave+0xb0/0xb0
[  279.069148]  ? stack_trace_save+0x81/0xa0
[  279.069157]  ? stack_depot_save+0x207/0x410
[  279.069166]  ? kasan_save_stack+0x32/0x40
[  279.069175]  ? kasan_save_stack+0x1b/0x40
[  279.069183]  ? ____kasan_kmalloc.constprop.0+0x84/0xa0
[  279.069193]  ? drm_atomic_helper_setup_commit+0x49b/0x1460 [drm_kms_helper]
[  279.069254]  ? drm_atomic_helper_commit+0x6b/0x270 [drm_kms_helper]
[  279.069312]  ? drm_atomic_helper_disable_plane+0x11d/0x220 [drm_kms_helper]
[  279.069370]  ? drm_mode_cursor_universal+0x3e1/0xb10 [drm]
[  279.069461]  ? drm_mode_cursor_common+0x249/0x900 [drm]
[  279.069551]  ? drm_mode_cursor_ioctl+0x84/0xa0 [drm]
[  279.069641]  ? drm_ioctl_kernel+0x1a3/0x230 [drm]
[  279.069723]  ? drm_ioctl+0x444/0x920 [drm]
[  279.069805]  ? amdgpu_drm_ioctl+0xce/0x1b0 [amdgpu]
[  279.070348]  ? __x64_sys_ioctl+0x127/0x190
[  279.070358]  ? do_syscall_64+0x33/0x40
[  279.070368]  ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  279.070379]  ? deactivate_slab+0x21c/0x540
[  279.070388]  ? drm_print_bits+0x170/0x170 [drm]
[  279.070479]  ? _raw_spin_lock_irq+0x6b/0xb0
[  279.070488]  ? _raw_spin_unlock_irqrestore+0x50/0x50
[  279.070499]  ? _cond_resched+0x17/0x70
[  279.070507]  ? wait_for_completion_timeout+0x75/0x250
[  279.070517]  ? _raw_spin_unlock_irqrestore+0x50/0x50
[  279.070526]  ? wait_for_completion_io+0x230/0x230
[  279.070536]  ? _cond_resched+0x17/0x70
[  279.070544]  ? wait_for_completion_interruptible+0x71/0x3a0
[  279.070554]  ? drm_crtc_commit_wait+0x2e/0x60 [drm]
[  279.070646]  ? drm_atomic_helper_wait_for_dependencies+0x428/0x660 [drm_kms_helper]
[  279.070707]  commit_tail+0x221/0x4a0 [drm_kms_helper]
[  279.070766]  drm_atomic_helper_commit+0x1f9/0x270 [drm_kms_helper]
[  279.070824]  drm_atomic_helper_disable_plane+0x11d/0x220 [drm_kms_helper]
[  279.070883]  drm_mode_cursor_universal+0x3e1/0xb10 [drm]
[  279.070975]  ? __setplane_atomic+0x500/0x500 [drm]
[  279.071065]  ? __mutex_lock_slowpath+0x10/0x10
[  279.071076]  ? drm_modeset_lock+0x144/0x2d0 [drm]
[  279.071169]  drm_mode_cursor_common+0x249/0x900 [drm]
[  279.071259]  ? ____sys_recvmsg+0x1b3/0x5a0
[  279.071271]  ? drm_mode_cursor_universal+0xb10/0xb10 [drm]
[  279.071362]  drm_mode_cursor_ioctl+0x84/0xa0 [drm]
[  279.071452]  ? drm_mode_setplane+0xab0/0xab0 [drm]
[  279.071541]  ? task_tick_fair+0x110/0xd50
[  279.071551]  ? drm_is_current_master+0x73/0x120 [drm]
[  279.071630]  ? drm_ioctl_permit+0x14c/0x180 [drm]
[  279.071712]  ? drm_mode_setplane+0xab0/0xab0 [drm]
[  279.071802]  drm_ioctl_kernel+0x1a3/0x230 [drm]
[  279.071885]  ? drm_setversion+0x7f0/0x7f0 [drm]
[  279.071968]  drm_ioctl+0x444/0x920 [drm]
[  279.072051]  ? drm_mode_setplane+0xab0/0xab0 [drm]
[  279.072141]  ? drm_version+0x390/0x390 [drm]
[  279.072224]  ? rpm_idle+0x610/0x610
[  279.072234]  ? ktime_get+0x59/0xe0
[  279.072244]  ? _raw_spin_lock_irqsave+0x70/0xb0
[  279.072254]  ? _raw_write_lock_irqsave+0xb0/0xb0
[  279.072265]  amdgpu_drm_ioctl+0xce/0x1b0 [amdgpu]
[  279.072810]  __x64_sys_ioctl+0x127/0x190
[  279.072820]  do_syscall_64+0x33/0x40
[  279.072829]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  279.072840] RIP: 0033:0x7fd18662e38b
[  279.072849] Code: 89 d8 49 8d 3c 1c 48 f7 d8 49 39 c4 72 b5 e8 1c ff ff ff 85 c0 78 ba 4c 89 e0 5b 5d 41 5c c3 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d bd ba 0c 00 f7 d8 64 89 01 48
[  279.072859] RSP: 002b:00007ffd14bc9e88 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[  279.072871] RAX: ffffffffffffffda RBX: 00007ffd14bc9ec0 RCX: 00007fd18662e38b
[  279.072879] RDX: 00007ffd14bc9ec0 RSI: 00000000c01c64a3 RDI: 0000000000000009
[  279.072886] RBP: 00000000c01c64a3 R08: 0000000000000000 R09: 0000000000000000
[  279.072893] R10: 00007fd1866faa00 R11: 0000000000000246 R12: 0000000000000000
[  279.072899] R13: 0000000000000009 R14: 00007ffd14bc9f70 R15: 000055f293696c60
[  279.072958] ================================================================================

The reproducer for this is pretty simple:

  • Boot the machine with UBSAN and KASAN enabled (for KASAN I'm using inline instrumentation with CONFIG_KASAN_VMALLOC also enabled
  • Load up a GUI
  • Do something to hide the cursor

I'm seeing this on drm-tip, and the system in question is using a Vega 64 GPU (21:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Vega 10 HDMI Audio [Radeon Vega 56/64] [1002:aaf8]). I don't think this is a regression, but unfortunately already went down the rabbit hole of trying to bisect this as it doesn't seem to appear reliably with only UBSAN enabled.

I mention this last bit regarding an UBSAN-only configuration as interestingly enough, bisecting this with only UBSAN enabled and checking all commits between v5.9 and drm-tip seems to lead us to this being the first commit where it goes away:

# first fixed commit: [f9915b964c25193a6be1aed744c946d6ff177149] Merge tag 'drm-next-2020-10-19' of git://anongit.freedesktop.org/drm/drm

At first this looked like bogus, but it's kind of difficult to tell as enabling KASAN on this commit does actually show a memory overrun that seems relevant to the code that was changed in this merge commit:

Feb 01 18:59:05 LyudeTestTowerGamma kernel: ==================================================================
Feb 01 18:59:05 LyudeTestTowerGamma kernel: BUG: KASAN: slab-out-of-bounds in kfd_create_crat_image_virtual+0x1389/0x14d0 [amdgpu]
Feb 01 18:59:05 LyudeTestTowerGamma kernel: Read of size 1 at addr ffff8881020e1281 by task systemd-udevd/731
Feb 01 18:59:05 LyudeTestTowerGamma kernel: 
Feb 01 18:59:05 LyudeTestTowerGamma kernel: CPU: 5 PID: 731 Comm: systemd-udevd Tainted: G        W         5.9.0Lyude-Test+ #66
Feb 01 18:59:05 LyudeTestTowerGamma kernel: Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
Feb 01 18:59:05 LyudeTestTowerGamma kernel: Call Trace:
Feb 01 18:59:05 LyudeTestTowerGamma kernel:  dump_stack+0x7d/0xa3
Feb 01 18:59:05 LyudeTestTowerGamma kernel:  print_address_description.constprop.0+0x1c/0x210
Feb 01 18:59:05 LyudeTestTowerGamma kernel:  ? _raw_spin_lock_irqsave+0x70/0xb0
Feb 01 18:59:05 LyudeTestTowerGamma kernel:  ? _raw_write_unlock_bh+0x60/0x60
Feb 01 18:59:05 LyudeTestTowerGamma kernel:  ? kfd_create_crat_image_virtual+0x1389/0x14d0 [amdgpu]
Feb 01 18:59:05 LyudeTestTowerGamma kernel:  ? kfd_create_crat_image_virtual+0x1389/0x14d0 [amdgpu]
Feb 01 18:59:05 LyudeTestTowerGamma kernel:  kasan_report.cold+0x37/0x7c
Feb 01 18:59:05 LyudeTestTowerGamma kernel:  ? kfd_create_crat_image_virtual+0x1389/0x14d0 [amdgpu]
Feb 01 18:59:05 LyudeTestTowerGamma kernel:  kfd_create_crat_image_virtual+0x1389/0x14d0 [amdgpu]
Feb 01 18:59:05 LyudeTestTowerGamma kernel:  ? kfd_create_crat_image_acpi+0xe0/0xe0 [amdgpu]
Feb 01 18:59:05 LyudeTestTowerGamma kernel:  ? device_create_groups_vargs+0x1cd/0x240
Feb 01 18:59:05 LyudeTestTowerGamma kernel:  ? kfd_parse_crat_table+0x2db0/0x2db0 [amdgpu]
Feb 01 18:59:05 LyudeTestTowerGamma kernel:  kfd_topology_init+0x2a2/0x3f0 [amdgpu]
Feb 01 18:59:05 LyudeTestTowerGamma kernel:  ? kfd_create_topology_device+0x320/0x320 [amdgpu]
Feb 01 18:59:05 LyudeTestTowerGamma kernel:  ? __class_register+0x298/0x420
Feb 01 18:59:05 LyudeTestTowerGamma kernel:  ? __class_create+0xc5/0x130
Feb 01 18:59:05 LyudeTestTowerGamma kernel:  kgd2kfd_init+0x95/0xf0 [amdgpu]
Feb 01 18:59:05 LyudeTestTowerGamma kernel:  amdgpu_amdkfd_init+0x7f/0xb0 [amdgpu]
Feb 01 18:59:05 LyudeTestTowerGamma kernel:  ? athub_v2_1_get_clockgating+0x110/0x110 [amdgpu]
Feb 01 18:59:05 LyudeTestTowerGamma kernel:  ? record_print_text.cold+0x11/0x11
Feb 01 18:59:05 LyudeTestTowerGamma kernel:  ? kmem_cache_create_usercopy+0x25a/0x300
Feb 01 18:59:05 LyudeTestTowerGamma kernel:  amdgpu_init+0xa0/0x1000 [amdgpu]
Feb 01 18:59:05 LyudeTestTowerGamma kernel:  ? 0xffffffffc17fc000
Feb 01 18:59:05 LyudeTestTowerGamma kernel:  do_one_initcall+0x89/0x2a0
Feb 01 18:59:05 LyudeTestTowerGamma kernel:  ? perf_trace_initcall_level+0x3b0/0x3b0
Feb 01 18:59:05 LyudeTestTowerGamma kernel:  ? kasan_unpoison_shadow+0x33/0x40
Feb 01 18:59:05 LyudeTestTowerGamma kernel:  ? __kasan_kmalloc.constprop.0+0xc2/0xd0
Feb 01 18:59:05 LyudeTestTowerGamma kernel:  ? kasan_unpoison_shadow+0x33/0x40
Feb 01 18:59:05 LyudeTestTowerGamma kernel:  ? kasan_unpoison_shadow+0x33/0x40
Feb 01 18:59:05 LyudeTestTowerGamma kernel:  do_init_module+0x1ce/0x780
Feb 01 18:59:05 LyudeTestTowerGamma kernel:  load_module+0x70d5/0x9860
Feb 01 18:59:05 LyudeTestTowerGamma kernel:  ? module_frob_arch_sections+0x20/0x20
Feb 01 18:59:05 LyudeTestTowerGamma kernel:  ? ima_post_read_file+0x184/0x210
Feb 01 18:59:05 LyudeTestTowerGamma kernel:  ? ima_read_file+0x1b0/0x1b0
Feb 01 18:59:05 LyudeTestTowerGamma kernel:  ? __kernel_read+0x1a7/0x4c0
Feb 01 18:59:05 LyudeTestTowerGamma kernel:  ? kernel_read_file_from_fd+0x4b/0x90
Feb 01 18:59:05 LyudeTestTowerGamma kernel:  __do_sys_finit_module+0xff/0x180
Feb 01 18:59:05 LyudeTestTowerGamma kernel:  ? __ia32_sys_init_module+0xa0/0xa0
Feb 01 18:59:05 LyudeTestTowerGamma kernel:  ? syscall_trace_enter.constprop.0+0x12e/0x1a0
Feb 01 18:59:05 LyudeTestTowerGamma kernel:  do_syscall_64+0x33/0x40
Feb 01 18:59:05 LyudeTestTowerGamma kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Feb 01 18:59:05 LyudeTestTowerGamma kernel: RIP: 0033:0x7ff63abc430d
Feb 01 18:59:05 LyudeTestTowerGamma kernel: Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 3b 7b 0c 00 f7 d8 64 89 01 48
Feb 01 18:59:05 LyudeTestTowerGamma kernel: RSP: 002b:00007fffb9fdaa28 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
Feb 01 18:59:05 LyudeTestTowerGamma kernel: RAX: ffffffffffffffda RBX: 0000559eccc2a640 RCX: 00007ff63abc430d
Feb 01 18:59:05 LyudeTestTowerGamma kernel: RDX: 0000000000000000 RSI: 00007ff63ad0035a RDI: 0000000000000018
Feb 01 18:59:05 LyudeTestTowerGamma kernel: RBP: 0000000000020000 R08: 0000000000000000 R09: 0000559eccc1d540
Feb 01 18:59:05 LyudeTestTowerGamma kernel: R10: 0000000000000018 R11: 0000000000000246 R12: 00007ff63ad0035a
Feb 01 18:59:05 LyudeTestTowerGamma kernel: R13: 0000559ecc9eebb0 R14: 0000000000000007 R15: 0000559eccc17c30
Feb 01 18:59:05 LyudeTestTowerGamma kernel: 
Feb 01 18:59:05 LyudeTestTowerGamma kernel: Allocated by task 731:
Feb 01 18:59:05 LyudeTestTowerGamma kernel:  kasan_save_stack+0x1b/0x40
Feb 01 18:59:05 LyudeTestTowerGamma kernel:  __kasan_kmalloc.constprop.0+0xc2/0xd0
Feb 01 18:59:05 LyudeTestTowerGamma kernel:  kfd_create_crat_image_virtual+0x13b/0x14d0 [amdgpu]
Feb 01 18:59:05 LyudeTestTowerGamma kernel:  kfd_topology_init+0x2a2/0x3f0 [amdgpu]
Feb 01 18:59:05 LyudeTestTowerGamma kernel:  kgd2kfd_init+0x95/0xf0 [amdgpu]
Feb 01 18:59:05 LyudeTestTowerGamma kernel:  amdgpu_amdkfd_init+0x7f/0xb0 [amdgpu]
Feb 01 18:59:05 LyudeTestTowerGamma kernel:  amdgpu_init+0xa0/0x1000 [amdgpu]
Feb 01 18:59:05 LyudeTestTowerGamma kernel:  do_one_initcall+0x89/0x2a0
Feb 01 18:59:05 LyudeTestTowerGamma kernel:  do_init_module+0x1ce/0x780
Feb 01 18:59:05 LyudeTestTowerGamma kernel:  load_module+0x70d5/0x9860
Feb 01 18:59:05 LyudeTestTowerGamma kernel:  __do_sys_finit_module+0xff/0x180
Feb 01 18:59:05 LyudeTestTowerGamma kernel:  do_syscall_64+0x33/0x40
Feb 01 18:59:05 LyudeTestTowerGamma kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Feb 01 18:59:05 LyudeTestTowerGamma kernel: 
Feb 01 18:59:05 LyudeTestTowerGamma kernel: The buggy address belongs to the object at ffff8881020e1200
                                             which belongs to the cache kmalloc-128 of size 128
Feb 01 18:59:05 LyudeTestTowerGamma kernel: The buggy address is located 1 bytes to the right of
                                             128-byte region [ffff8881020e1200, ffff8881020e1280)
Feb 01 18:59:05 LyudeTestTowerGamma kernel: The buggy address belongs to the page:
Feb 01 18:59:05 LyudeTestTowerGamma kernel: page:0000000002599104 refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff8881020e0400 pfn:0x1020e0
Feb 01 18:59:05 LyudeTestTowerGamma kernel: head:0000000002599104 order:1 compound_mapcount:0
Feb 01 18:59:05 LyudeTestTowerGamma kernel: flags: 0x17ffffc0010200(slab|head)
Feb 01 18:59:05 LyudeTestTowerGamma kernel: raw: 0017ffffc0010200 ffffea0004306708 ffff888100040210 ffff888100043a40
Feb 01 18:59:05 LyudeTestTowerGamma kernel: raw: ffff8881020e0400 0000000000200011 00000001ffffffff 0000000000000000
Feb 01 18:59:05 LyudeTestTowerGamma kernel: page dumped because: kasan: bad access detected
Feb 01 18:59:05 LyudeTestTowerGamma kernel: 
Feb 01 18:59:05 LyudeTestTowerGamma kernel: Memory state around the buggy address:
Feb 01 18:59:05 LyudeTestTowerGamma kernel:  ffff8881020e1180: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
Feb 01 18:59:05 LyudeTestTowerGamma kernel:  ffff8881020e1200: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Feb 01 18:59:05 LyudeTestTowerGamma kernel: >ffff8881020e1280: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
Feb 01 18:59:05 LyudeTestTowerGamma kernel:                    ^
Feb 01 18:59:05 LyudeTestTowerGamma kernel:  ffff8881020e1300: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 fc
Feb 01 18:59:05 LyudeTestTowerGamma kernel:  ffff8881020e1380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
Feb 01 18:59:05 LyudeTestTowerGamma kernel: ==================================================================
Feb 01 18:59:05 LyudeTestTowerGamma kernel: Disabling lock debugging due to kernel taint

However, later on in drm-tip this KASAN error goes away, while the UBSAN issue still seems to persist. Anyway, I've attached a full dmesg from a drm-tip kernel where this happens. Unfortunately, you'll notice that there's also one KASAN error and another UBSAN error in this kernel:

dmesg.log

Edited Feb 03, 2021 by Lyude Paul
Assignee
Assign to
None
Milestone
None
Assign milestone
Time tracking
None
Due date
None