Null pointer running clinfo on 5.18.0-rc4
I'm seeing the below error on 5.18.0-rc4
[15084.671474] BUG: kernel NULL pointer dereference, address: 0000000000000008
[15084.671478] #PF: supervisor read access in kernel mode
[15084.671480] #PF: error_code(0x0000) - not-present page
[15084.671481] PGD 0 P4D 0
[15084.671484] Oops: 0000 [#1] PREEMPT SMP NOPTI
[15084.671487] CPU: 3 PID: 123 Comm: kworker/3:1 Tainted: G W 5.19.0-rc4-tip+ #3199
[15084.671489] Hardware name: ASUSTeK COMPUTER INC. ROG Strix G513QY_G513QY/G513QY, BIOS G513QY.318 03/29/2022
[15084.671490] Workqueue: events delayed_fput
[15084.671494] RIP: 0010:dma_resv_add_fence+0x3f/0x180
[15084.671498] Code: 89 fd 48 85 f6 74 23 be 01 00 00 00 b8 01 00 00 00 f0 41 0f c1 47 38 85 c0 0f 84 16 01 00 00 8d 48 01 09 c1 0f 88 10 01 00 00 <49> 8b 47 08 48 c7 c1 60 2f 2e 83 48 39 c8 0f 84 f0 00 00 00 48 c7
[15084.671500] RSP: 0018:ffff888103417c90 EFLAGS: 00010246
[15084.671502] RAX: 0000000000000000 RBX: ffff888104f85128 RCX: 0000000000000000
[15084.671503] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff8881fd8b0158
[15084.671504] RBP: ffff8881fd8b0158 R08: ffff888130d4f680 R09: 0000000080200017
[15084.671505] R10: 0000000000000001 R11: 0000000000000000 R12: ffff888131a39000
[15084.671506] R13: ffff8881fd8b0000 R14: 0000000000000001 R15: 0000000000000000
[15084.671507] FS: 0000000000000000(0000) GS:ffff888fde4c0000(0000) knlGS:0000000000000000
[15084.671508] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[15084.671510] CR2: 0000000000000008 CR3: 00000000b360c000 CR4: 0000000000350ee0
[15084.671511] Call Trace:
[15084.671513] <TASK>
[15084.671514] ? amdgpu_amdkfd_gpuvm_destroy_cb+0x5b/0x1d0
[15084.671517] ? amdgpu_vm_fini+0x2d/0x670
[15084.671520] ? idr_get_next+0x93/0x140
[15084.671523] ? amdgpu_driver_postclose_kms+0x1ce/0x2e0
[15084.671525] ? drm_file_free+0x1e0/0x220
[15084.671527] ? drm_release+0xc0/0x160
[15084.671528] ? __fput+0xdf/0x200
[15084.671530] ? delayed_fput+0x28/0x40
[15084.671531] ? process_one_work+0x20c/0x3d0
[15084.671533] ? worker_thread+0x23d/0x540
[15084.671535] ? worker_clr_flags+0x40/0x40
[15084.671537] ? kthread+0xe0/0x100
[15084.671538] ? kthread_blkcg+0x30/0x30
[15084.671540] ? ret_from_fork+0x22/0x30
[15084.671542] </TASK>
[15084.671543] Modules linked in:
[15084.671544] CR2: 0000000000000008
[15084.671545] ---[ end trace 0000000000000000 ]---
[15084.681487] RIP: 0010:dma_resv_add_fence+0x3f/0x180
[15084.681491] Code: 89 fd 48 85 f6 74 23 be 01 00 00 00 b8 01 00 00 00 f0 41 0f c1 47 38 85 c0 0f 84 16 01 00 00 8d 48 01 09 c1 0f 88 10 01 00 00 <49> 8b 47 08 48 c7 c1 60 2f 2e 83 48 39 c8 0f 84 f0 00 00 00 48 c7
[15084.681492] RSP: 0018:ffff888103417c90 EFLAGS: 00010246
[15084.681494] RAX: 0000000000000000 RBX: ffff888104f85128 RCX: 0000000000000000
[15084.681495] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff8881fd8b0158
[15084.681495] RBP: ffff8881fd8b0158 R08: ffff888130d4f680 R09: 0000000080200017
[15084.681496] R10: 0000000000000001 R11: 0000000000000000 R12: ffff888131a39000
[15084.681497] R13: ffff8881fd8b0000 R14: 0000000000000001 R15: 0000000000000000
[15084.681497] FS: 0000000000000000(0000) GS:ffff888fde4c0000(0000) knlGS:0000000000000000
[15084.681498] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[15084.681499] CR2: 0000000000000008 CR3: 0000000177a62000 CR4: 0000000000350ee0
Full dmesg: dmesg.amdkfd
Thought https://cgit.freedesktop.org/drm/drm/commit/?h=drm-fixes&id=5cb0e3fb2c54eabfb3f932a1574bff1774946bc0 might help but no change and forcing "all_hub = true" didn't fix it either
Kernel 5.18.0 worked fine agd5f's drm-next gives a different error:
Jun 30 13:49:26 axion.fireburn.co.uk kernel: amdgpu 0000:03:00.0: amdgpu: Failed to map peer:0000:08:00.0 mem_domain:2
Jun 30 13:49:26 axion.fireburn.co.uk kernel: amdgpu 0000:08:00.0: amdgpu: Failed to map peer:0000:03:00.0 mem_domain:2
Jun 30 13:49:26 axion.fireburn.co.uk kernel: ------------[ cut here ]------------
Jun 30 13:49:26 axion.fireburn.co.uk kernel: WARNING: CPU: 0 PID: 6 at drivers/gpu/drm/ttm/ttm_bo.c:710 ttm_bo_unpin+0x68/0x70
Jun 30 13:49:26 axion.fireburn.co.uk kernel: Modules linked in:
Jun 30 13:49:26 axion.fireburn.co.uk kernel: CPU: 0 PID: 6 Comm: kworker/0:0 Tainted: G W 5.18.0-rc5-agd5f+ #1517
Jun 30 13:49:26 axion.fireburn.co.uk kernel: Hardware name: ASUSTeK COMPUTER INC. ROG Strix G513QY_G513QY/G513QY, BIOS G513QY.318 03/29/2022
Jun 30 13:49:26 axion.fireburn.co.uk kernel: Workqueue: kfd_process_wq kfd_process_wq_release
Jun 30 13:49:26 axion.fireburn.co.uk kernel: RIP: 0010:ttm_bo_unpin+0x68/0x70
Jun 30 13:49:26 axion.fireburn.co.uk kernel: Code: 48 85 ff 74 08 48 89 de e8 e5 33 00 00 4c 03 b3 38 01 00 00 4c 89 f7 5b 41 5e e9 63 28 b4 00 0f 0b 83 bb 8c 01 00 00 00 75 b4 <0f> 0b 5b 41 5e c3 00 00 f3 0f >
Jun 30 13:49:26 axion.fireburn.co.uk kernel: RSP: 0018:ffff88810033bd00 EFLAGS: 00010246
Jun 30 13:49:26 axion.fireburn.co.uk kernel: RAX: 0000000000000000 RBX: ffff88814126c458 RCX: 0000000000000000
Jun 30 13:49:26 axion.fireburn.co.uk kernel: RDX: 0000000000000098 RSI: 0000000000000000 RDI: ffff88814126c458
Jun 30 13:49:26 axion.fireburn.co.uk kernel: RBP: ffff888103645128 R08: ffff888103645d68 R09: ffff88810033bd50
Jun 30 13:49:26 axion.fireburn.co.uk kernel: R10: ffff888147b94c38 R11: ffffffff8184a450 R12: ffff888137339400
Jun 30 13:49:26 axion.fireburn.co.uk kernel: R13: ffff888137339400 R14: ffff888103645128 R15: ffff888113179700
Jun 30 13:49:26 axion.fireburn.co.uk kernel: FS: 0000000000000000(0000) GS:ffff888fde400000(0000) knlGS:0000000000000000
Jun 30 13:49:26 axion.fireburn.co.uk kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jun 30 13:49:26 axion.fireburn.co.uk kernel: CR2: 000055bb547a5eb8 CR3: 00000000ac60c000 CR4: 0000000000150ef0
Jun 30 13:49:26 axion.fireburn.co.uk kernel: Call Trace:
Jun 30 13:49:26 axion.fireburn.co.uk kernel: <TASK>
Jun 30 13:49:26 axion.fireburn.co.uk kernel: ? amdgpu_bo_unpin+0x1a/0x80
Jun 30 13:49:26 axion.fireburn.co.uk kernel: ? amdgpu_amdkfd_gpuvm_free_memory_of_gpu+0x90/0x3a0
Jun 30 13:49:26 axion.fireburn.co.uk kernel: ? amdgpu_amdkfd_gpuvm_unmap_memory_from_gpu+0x1d2/0x210
Jun 30 13:49:26 axion.fireburn.co.uk kernel: ? kfd_process_device_free_bos+0xd9/0x110
Jun 30 13:49:26 axion.fireburn.co.uk kernel: ? kfd_process_wq_release+0x2f0/0x3a0
Jun 30 13:49:26 axion.fireburn.co.uk kernel: ? process_one_work+0x20c/0x3d0
Jun 30 13:49:26 axion.fireburn.co.uk kernel: ? worker_thread+0x23d/0x540
Jun 30 13:49:26 axion.fireburn.co.uk kernel: ? worker_clr_flags+0x40/0x40
Jun 30 13:49:26 axion.fireburn.co.uk kernel: ? kthread+0xe0/0x100
Jun 30 13:49:26 axion.fireburn.co.uk kernel: ? kthread_blkcg+0x30/0x30
Jun 30 13:49:26 axion.fireburn.co.uk kernel: ? ret_from_fork+0x22/0x30
Jun 30 13:49:26 axion.fireburn.co.uk kernel: </TASK>
Jun 30 13:49:26 axion.fireburn.co.uk kernel: ---[ end trace 0000000000000000 ]---
Jun 30 13:49:26 axion.fireburn.co.uk kernel: BUG: kernel NULL pointer dereference, address: 0000000000000008
Jun 30 13:49:26 axion.fireburn.co.uk kernel: #PF: supervisor read access in kernel mode
Jun 30 13:49:26 axion.fireburn.co.uk kernel: #PF: error_code(0x0000) - not-present page
Edited by Mike Lothian