Skip to content

GitLab

  • Projects
  • Groups
  • Snippets
  • Help
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
A
amd
  • Project overview
    • Project overview
    • Details
    • Activity
    • Releases
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 1,058
    • Issues 1,058
    • List
    • Boards
    • Labels
    • Service Desk
    • Milestones
  • Merge Requests 1
    • Merge Requests 1
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Operations
    • Operations
    • Incidents
    • Environments
  • Packages & Registries
    • Packages & Registries
    • Container Registry
  • Analytics
    • Analytics
    • CI/CD
    • Repository
    • Value Stream
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Members
    • Members
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • drm
  • amd
  • Issues
  • #1473

Closed
Open
Created Feb 04, 2021 by Lyude Paul@lyudess

dce110: KASAN global-out-of-bounds in read_indirect_azalia_reg in 5.11.0-rc5+

And another memory issue, this time seen with just KASAN:

[    9.913806] ==================================================================
[    9.913814] BUG: KASAN: global-out-of-bounds in read_indirect_azalia_reg+0x204/0x2d0 [amdgpu]
[    9.914178] Read of size 4 at addr ffffffffc18800c8 by task systemd-udevd/731

[    9.914188] CPU: 5 PID: 731 Comm: systemd-udevd Not tainted 5.11.0-rc5Lyude-Test+ #69
[    9.914194] Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
[    9.914200] Call Trace:
[    9.914204]  dump_stack+0x7d/0xa3
[    9.914213]  print_address_description.constprop.0+0x18/0x130
[    9.914221]  ? read_indirect_azalia_reg+0x204/0x2d0 [amdgpu]
[    9.914566]  ? read_indirect_azalia_reg+0x204/0x2d0 [amdgpu]
[    9.914912]  kasan_report.cold+0x7f/0x10e
[    9.914919]  ? read_indirect_azalia_reg+0x204/0x2d0 [amdgpu]
[    9.915262]  read_indirect_azalia_reg+0x204/0x2d0 [amdgpu]
[    9.915605]  dce_aud_endpoint_valid+0xf/0x20 [amdgpu]
[    9.915947]  resource_construct+0x305/0xc10 [amdgpu]
[    9.916290]  ? unpoison_range+0x3a/0x60
[    9.916295]  ? dc_destroy_resource_pool+0xe0/0xe0 [amdgpu]
[    9.916636]  ? unpoison_range+0x3a/0x60
[    9.916641]  ? ____kasan_kmalloc.constprop.0+0x84/0xa0
[    9.916647]  dce120_create_resource_pool+0x13ee/0x1b50 [amdgpu]
[    9.916992]  ? dce120_clock_source_create+0x110/0x110 [amdgpu]
[    9.917333]  dc_create_resource_pool+0x402/0x580 [amdgpu]
[    9.917674]  dc_create+0x636/0x1d80 [amdgpu]
[    9.918020]  ? init_object+0x4e/0x80
[    9.918025]  ? amdgpu_cgs_create_device+0x3e/0xd0 [amdgpu]
[    9.918346]  ? dc_create_state+0xa0/0xa0 [amdgpu]
[    9.918687]  ? ___slab_alloc+0x2bf/0x5b0
[    9.918693]  ? amdgpu_cgs_create_device+0x3e/0xd0 [amdgpu]
[    9.919015]  ? drm_print_bits+0x170/0x170 [drm]
[    9.919056]  ? unpoison_range+0x3a/0x60
[    9.919061]  ? ____kasan_kmalloc.constprop.0+0x84/0xa0
[    9.919066]  ? unpoison_range+0x3a/0x60
[    9.919071]  ? ____kasan_kmalloc.constprop.0+0x84/0xa0
[    9.919076]  amdgpu_dm_init.isra.0+0x473/0x640 [amdgpu]
[    9.919418]  ? amdgpu_device_rreg.part.0+0x81/0x290 [amdgpu]
[    9.919697]  ? dm_resume+0x1400/0x1400 [amdgpu]
[    9.920037]  ? smu9_wait_for_response+0x164/0x220 [amdgpu]
[    9.920364]  ? smu9_send_msg_to_smc_with_parameter+0x169/0x270 [amdgpu]
[    9.920690]  ? mutex_unlock+0x1d/0x40
[    9.920697]  ? smum_send_msg_to_smc_with_parameter+0x199/0x300 [amdgpu]
[    9.921021]  ? vega10_fan_ctrl_start_smc_fan_control+0x111/0x1d0 [amdgpu]
[    9.921354]  ? memcpy+0x39/0x60
[    9.921359]  ? psm_set_states+0x10e/0x190 [amdgpu]
[    9.921721]  dm_hw_init+0xe/0x20 [amdgpu]
[    9.922059]  amdgpu_device_init.cold+0x3404/0x48f5 [amdgpu]
[    9.922401]  ? amdgpu_device_cache_pci_state+0xf0/0xf0 [amdgpu]
[    9.922689]  ? pci_bus_read_config_byte+0x140/0x140
[    9.922696]  ? do_pci_enable_device.part.0+0x1d3/0x230
[    9.922701]  ? pci_find_saved_ext_cap+0x120/0x120
[    9.922706]  ? skcipher_walk_done+0x259/0xd30
[    9.922712]  ? pci_enable_device_flags+0x28b/0x370
[    9.922717]  amdgpu_driver_load_kms+0x166/0x8c0 [amdgpu]
[    9.922998]  amdgpu_pci_probe+0x206/0x320 [amdgpu]
[    9.923276]  ? amdgpu_pmops_runtime_suspend+0x2c0/0x2c0 [amdgpu]
[    9.923555]  local_pci_probe+0xd8/0x170
[    9.923561]  pci_device_probe+0x32e/0x600
[    9.923565]  ? kernfs_create_link+0x160/0x220
[    9.923571]  ? pci_device_remove+0x1d0/0x1d0
[    9.923577]  really_probe+0x231/0xcf0
[    9.923583]  driver_probe_device+0x1fe/0x380
[    9.923588]  ? _raw_write_lock_irqsave+0xb0/0xb0
[    9.923594]  device_driver_attach+0x205/0x270
[    9.923599]  __driver_attach+0xf4/0x260
[    9.923604]  ? device_driver_attach+0x270/0x270
[    9.923609]  bus_for_each_dev+0x111/0x180
[    9.923614]  ? _raw_read_lock_irq+0x30/0x30
[    9.923619]  ? subsys_dev_iter_exit+0x10/0x10
[    9.923624]  ? klist_node_init+0x61/0x120
[    9.923629]  ? klist_add_tail+0x5c/0x160
[    9.923634]  bus_add_driver+0x34f/0x580
[    9.923639]  driver_register+0x1ee/0x380
[    9.923644]  ? 0xffffffffc1d45000
[    9.923648]  do_one_initcall+0x89/0x2a0
[    9.923654]  ? perf_trace_initcall_level+0x3b0/0x3b0
[    9.923659]  ? unpoison_range+0x3a/0x60
[    9.923663]  ? ____kasan_kmalloc.constprop.0+0x84/0xa0
[    9.923669]  ? unpoison_range+0x3a/0x60
[    9.923673]  ? unpoison_range+0x3a/0x60
[    9.923678]  do_init_module+0x1ce/0x7c0
[    9.923685]  load_module+0x8fa0/0x95d0
[    9.923692]  ? module_frob_arch_sections+0x20/0x20
[    9.923697]  ? ima_post_read_file+0x15e/0x190
[    9.923703]  ? ima_read_file+0x140/0x140
[    9.923708]  ? kernel_read_file_from_fd+0x4b/0x90
[    9.923715]  __do_sys_finit_module+0xff/0x180
[    9.923719]  ? __ia32_sys_init_module+0xa0/0xa0
[    9.923725]  ? syscall_trace_enter.constprop.0+0x142/0x1c0
[    9.923732]  do_syscall_64+0x33/0x40
[    9.923738]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[    9.923744] RIP: 0033:0x7f42ca88f30d
[    9.923750] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 3b 7b 0c 00 f7 d8 64 89 01 48
[    9.923758] RSP: 002b:00007ffca53bad08 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
[    9.923766] RAX: ffffffffffffffda RBX: 000055ee6a184250 RCX: 00007f42ca88f30d
[    9.923771] RDX: 0000000000000000 RSI: 000055ee6a17a9b0 RDI: 0000000000000019
[    9.923776] RBP: 0000000000020000 R08: 0000000000000000 R09: 000055ee6a17aa70
[    9.923780] R10: 0000000000000019 R11: 0000000000000246 R12: 000055ee6a17a9b0
[    9.923785] R13: 000055ee69f58a00 R14: 0000000000000000 R15: 000055ee69f5c450

[    9.923793] The buggy address belongs to the variable:
[    9.923796]  audio_regs+0x108/0xffffffffffe3e040 [amdgpu]

[    9.924117] Memory state around the buggy address:
[    9.924121]  ffffffffc187ff80: 00 00 04 f9 f9 f9 f9 f9 00 00 00 00 00 00 00 00
[    9.924127]  ffffffffc1880000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[    9.924132] >ffffffffc1880080: 00 00 00 00 00 00 00 00 00 f9 f9 f9 f9 f9 f9 f9
[    9.924136]                                               ^
[    9.924140]  ffffffffc1880100: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[    9.924145]  ffffffffc1880180: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 f9 f9
[    9.924149] ==================================================================
[    9.924154] Disabling lock debugging due to kernel taint

Note that I do have sound disabled in this kernel configuration (amdgpu.audio=0 is also set), however I've already tried re-enabling it and it didn't make much of a difference. That being said, I didn't check to see if I actually got audio working when I re-enabled things so it's not entirely possible that is related to this. I'm able to reproduce this on drm-tip during boot up with my Vega 64: 21:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Vega 10 XL/XT [Radeon RX Vega 56/64] [1002:687f] (rev c3) .

This seems like it might actually be regression, because I didn't run into this on older kernels. I haven't had the time to bisect this though. Note though, that on older kernels I -did- run into a different KASAN error coming from amdkfd which you can find the splat for in #1471 (see the second backtrace in the issue description).

Dmesg attached: dmesg.log

Assignee
Assign to
None
Milestone
None
Assign milestone
Time tracking
None
Due date
None