Skip to content
GitLab
Projects Groups Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
  • A amd
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 1,470
    • Issues 1,470
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 0
    • Merge requests 0
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Packages and registries
    • Packages and registries
    • Container Registry
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • drm
  • amd
  • Issues
  • #2100
Closed
Open
Issue created Jul 21, 2022 by Evan Lojewski@meklort1

[regression] amdgpu NULL pointer dereference on ppc64le in 5.19-rc7

Brief summary of the problem:

After updating the kernel from rc6 to rc7, the kernel panics with:

[    3.078328] Kernel attempted to read user page (498) - exploit attempt? (uid: 0)
[    3.078355] BUG: Kernel NULL pointer dereference on read at 0x00000498
[    3.078379] Faulting instruction address: 0xc0080000038f06dc
[    3.078393] Oops: Kernel access of bad area, sig: 11 [#1]
[    3.078421] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA PowerNV
[    3.078463] Modules linked in: amdgpu(+) ast nvme mfd_core drm_vram_helper gpu_sched drm_ttm_helper tg3 vmx_crypto drm_display_helper ttm nvme_core crc32c_vpmsum cec scsi_dh_rdac scsi_dh_emc scsi_dh_alua ip6_tables ip_tables pkcs8_key_parser dm_multipath fuse
[    3.078581] CPU: 0 PID: 15 Comm: kworker/0:1 Not tainted 5.19.0-0.rc7.53.fc37.ppc64le #1
[    3.078612] Workqueue: events work_for_cpu_fn
[    3.078629] NIP:  c0080000038f06dc LR: c0080000038f2b08 CTR: 0000000000000000
[    3.078677] REGS: c0000000038d3320 TRAP: 0300   Not tainted  (5.19.0-0.rc7.53.fc37.ppc64le)
[    3.078716] MSR:  900000000280b033 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  CR: 24002220  XER: 00000000
[    3.078763] CFAR: c0080000038f06a0 DAR: 0000000000000498 DSISR: 40000000 IRQMASK: 0 
[    3.078763] GPR00: c0080000038f2b08 c0000000038d35c0 c008000003ab8000 c00000000bd00000 
[    3.078763] GPR04: c0000000038d37f0 000000000000000f c000000005daa220 000000000000001a 
[    3.078763] GPR08: c000000005daa220 0000000000000000 0000000000000000 c68d4ef7050f600d 
[    3.078763] GPR12: c0000000004dc4e0 c000000002b40000 c0000000057d0000 c0000000057d7e50 
[    3.078763] GPR16: c0000000057c5da8 c0000000057c5db0 c0000000057c0000 0000000000000000 
[    3.078763] GPR20: c0000000057c5da0 0000000000000100 0000000000000001 0000000000000001 
[    3.078763] GPR24: c0000000057d6e01 c008000003ac45c0 c0000000057e0000 0000000000000000 
[    3.078763] GPR28: c00000000bd10000 0000000000000000 c00000000bd003a8 c00000000bd00000 
[    3.079108] NIP [c0080000038f06dc] dc_destruct+0xe4/0x2d0 [amdgpu]
[    3.079461] LR [c0080000038f2b08] dc_create+0x390/0x5b0 [amdgpu]
[    3.079814] Call Trace:
[    3.079822] [c0000000038d35c0] [c00000000bd10000] 0xc00000000bd10000 (unreliable)
[    3.079852] [c0000000038d3620] [c0080000038f2b08] dc_create+0x390/0x5b0 [amdgpu]
[    3.080192] [c0000000038d36c0] [c008000003872ee0] amdgpu_dm_init.isra.0+0x238/0x1f00 [amdgpu]
[    3.080547] [c0000000038d3940] [c008000003874bd0] dm_hw_init+0x28/0x60 [amdgpu]
[    3.080880] [c0000000038d3970] [c008000003539a08] amdgpu_device_init+0x1c50/0x23a0 [amdgpu]
[    3.081203] [c0000000038d3ad0] [c00800000353b958] amdgpu_driver_load_kms+0x30/0x240 [amdgpu]
[    3.081536] [c0000000038d3b50] [c008000003530890] amdgpu_pci_probe+0x1c8/0x540 [amdgpu]
[    3.081858] [c0000000038d3be0] [c000000000adc748] local_pci_probe+0x68/0xe0
[    3.081898] [c0000000038d3c60] [c000000000174888] work_for_cpu_fn+0x38/0x60
[    3.081940] [c0000000038d3c90] [c00000000017a04c] process_one_work+0x2ac/0x570
[    3.081990] [c0000000038d3d30] [c00000000017abf0] worker_thread+0x280/0x630
[    3.082042] [c0000000038d3dc0] [c000000000186da4] kthread+0x124/0x130
[    3.082089] [c0000000038d3e10] [c00000000000ce54] ret_from_kernel_thread+0x5c/0x64
[    3.082153] Instruction dump:
[    3.082199] 60420000 e93e0009 3bbd0001 2c290000 7fc3f378 41820010 4800cf2d 60000000 
[    3.082250] 895f03a8 7c1d5040 4180ffdc e93f0418 <e9490498> 810a002c 2c080000 41820094 
[    3.082300] ---[ end trace 0000000000000000 ]---

Hardware description:

  • CPU: POWER9
  • GPU: 0000:03:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 [Radeon RX 5600 OEM/5600 XT / 5700/5700 XT] [1002:731f] (rev c1)
  • System Memory:
  • Display(s):
  • Type of Display Connection: HDMI

System information:

  • Distro name and Version: Fedora Rawhide (fedora 37)
  • Kernel version: 5.19.0-0.rc7.53.fc37.ppc64le
  • Custom kernel: N/A
  • AMD official driver version: N/A

How to reproduce the issue:

(1) Install 5.19-rc7 package from fedora, updating from 5.19-rc6 (2) reboot

Note that I was able to locate this commit that may be causing the issue: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=v5.19-rc7&id=d11219ad53dcf61ced53ca60fe0c4a8d34393e6c

Assignee
Assign to
Time tracking