RX 580: igt@amdgpu/amd_cs_nop@fork-compute0: FAIL - kernel NULL pointer dereference, address: 00000000000000f8
Brief summary of the problem:
When running the IGT test igt@amdgpu/amd_cs_nop
, I get a NULL pointer dereference
warning in the subtest fork-compute0
:
[ 198.057729] [IGT] amd_cs_nop: executing
[ 198.075303] [IGT] amd_cs_nop: starting subtest nop-compute0
[ 219.571666] [IGT] amd_cs_nop: exiting, ret=0
[ 219.588352] Console: switching to colour frame buffer device 240x67
[ 219.628425] Console: switching to colour dummy device 80x25
[ 219.628448] [IGT] amd_cs_nop: executing
[ 219.636282] [IGT] amd_cs_nop: starting subtest nop-gfx0
[ 241.137581] [IGT] amd_cs_nop: exiting, ret=0
[ 241.154315] Console: switching to colour frame buffer device 240x67
[ 241.196900] Console: switching to colour dummy device 80x25
[ 241.196923] [IGT] amd_cs_nop: executing
[ 241.204025] [IGT] amd_cs_nop: starting subtest sync-compute0
[ 262.703511] [IGT] amd_cs_nop: exiting, ret=0
[ 262.720316] Console: switching to colour frame buffer device 240x67
[ 262.768742] Console: switching to colour dummy device 80x25
[ 262.768780] [IGT] amd_cs_nop: executing
[ 262.774916] [IGT] amd_cs_nop: starting subtest sync-gfx0
[ 284.286297] [IGT] amd_cs_nop: exiting, ret=0
[ 284.303027] Console: switching to colour frame buffer device 240x67
[ 284.352408] Console: switching to colour dummy device 80x25
[ 284.352451] [IGT] amd_cs_nop: executing
[ 284.359565] [IGT] amd_cs_nop: starting subtest fork-compute0
[ 284.361097] BUG: kernel NULL pointer dereference, address: 00000000000000f8
[ 284.361109] #PF: supervisor read access in kernel mode
[ 284.361115] #PF: error_code(0x0000) - not-present page
[ 284.361121] PGD 13cd96067 P4D 13cd96067 PUD 10d17e067 PMD 0
[ 284.361131] Oops: 0000 [#1] PREEMPT SMP NOPTI
[ 284.361138] CPU: 11 PID: 2608 Comm: amd_cs_nop Not tainted 5.16.0-e68e+ #53 379b6e43c5ec0dc27b291be75619e080240e9dae
[ 284.361150] Hardware name: Gigabyte Technology Co., Ltd. B450 AORUS M/B450 AORUS M, BIOS F50 11/27/2019
[ 284.361157] RIP: 0010:ttm_eu_fence_buffer_objects+0x82/0xb0 [ttm]
[ 284.361177] Code: 8d b8 10 08 00 00 e8 6d 44 4a d0 48 8b bb f8 00 00 00 e8 21 f2 49 d0 48 8b 6d 00 4c 39 e5 74 1c 48 8b 5d 10 8b 45 18 4c 89 ee <48> 8b bb f8 00 00 00 85 c0 75 9a e8 0e c0 0d d0 eb 98 5b 5d 41 5c
[ 284.361188] RSP: 0018:ffffb66800eafab8 EFLAGS: 00010283
[ 284.361195] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[ 284.361201] RDX: 0000000000000000 RSI: ffff98325903cf40 RDI: 0000000000000000
[ 284.361207] RBP: ffffb66800cafc30 R08: 0000000000000000 R09: 0000000000000000
[ 284.361212] R10: 0000000000000000 R11: 0000000000000000 R12: ffffb66800eafbf0
[ 284.361217] R13: ffff98325903cf40 R14: 0000000000000000 R15: ffffb66800eafb38
[ 284.361223] FS: 00007f7af2cc3a80(0000) GS:ffff98354ecc0000(0000) knlGS:0000000000000000
[ 284.361231] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 284.361237] CR2: 00000000000000f8 CR3: 0000000140144000 CR4: 00000000003506e0
[ 284.361243] Call Trace:
[ 284.361248] <TASK>
[ 284.361255] amdgpu_cs_ioctl+0x1d2f/0x2030 [amdgpu 0f9ac45183286e110256740e1ec76d36751472cc]
[ 284.361765] ? amdgpu_cs_find_mapping+0x110/0x110 [amdgpu 0f9ac45183286e110256740e1ec76d36751472cc]
[ 284.362264] drm_ioctl_kernel+0xcc/0x170
[ 284.362277] drm_ioctl+0x23a/0x420
[ 284.362285] ? amdgpu_cs_find_mapping+0x110/0x110 [amdgpu 0f9ac45183286e110256740e1ec76d36751472cc]
[ 284.362821] amdgpu_drm_ioctl+0x4a/0x80 [amdgpu 0f9ac45183286e110256740e1ec76d36751472cc]
[ 284.363297] __x64_sys_ioctl+0x94/0xd0
[ 284.363309] do_syscall_64+0x5b/0x90
[ 284.363321] ? syscall_exit_to_user_mode+0x23/0x50
[ 284.363330] ? do_syscall_64+0x67/0x90
[ 284.363337] ? syscall_exit_to_user_mode+0x23/0x50
[ 284.363344] ? do_syscall_64+0x67/0x90
[ 284.363351] ? exit_to_user_mode_prepare+0x8d/0x190
[ 284.363361] ? syscall_exit_to_user_mode+0x23/0x50
[ 284.363367] ? do_syscall_64+0x67/0x90
[ 284.363374] ? syscall_exit_to_user_mode+0x23/0x50
[ 284.363381] ? do_syscall_64+0x67/0x90
[ 284.363388] ? syscall_exit_to_user_mode+0x23/0x50
[ 284.363394] ? do_syscall_64+0x67/0x90
[ 284.363402] ? syscall_exit_to_user_mode+0x23/0x50
[ 284.363408] ? do_syscall_64+0x67/0x90
[ 284.363415] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 284.363424] RIP: 0033:0x7f7af47772fb
[ 284.363431] Code: 0f 1e fa 48 8b 05 8d 9b 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 5d 9b 0c 00 f7 d8 64 89 01 48
[ 284.363442] RSP: 002b:00007fff7422a738 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[ 284.363450] RAX: ffffffffffffffda RBX: 00007fff7422a840 RCX: 00007f7af47772fb
[ 284.363458] RDX: 00007fff7422a840 RSI: 00000000c0186444 RDI: 0000000000000006
[ 284.363466] RBP: 00000000c0186444 R08: 00007fff7422a7b0 R09: 0000000001ce4730
[ 284.363473] R10: 0000000000000001 R11: 0000000000000246 R12: 0000000001ce4738
[ 284.363481] R13: 0000000000000006 R14: 00007fff7422a780 R15: 00007fff7422aa10
[ 284.363489] </TASK>
[ 284.363493] Modules linked in: veth xt_conntrack xt_MASQUERADE xt_addrtype iptable_filter iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c br_netfilter bridge stp llc overlay ccm amdgpu bnep iwlmvm btusb btrtl btbcm btintel bluetooth joydev mousedev usbhid ecdh_generic mac80211 libarc4 iwlwifi qrtr intel_rapl_msr intel_rapl_common kvm_amd vfat kvm fat cfg80211 snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi ledtrig_audio snd_hda_intel snd_intel_dspcfg snd_hda_codec r8169 snd_hwdep snd_hda_core ppdev irqbypass snd_pcm crct10dif_pclmul crc32_pclmul gigabyte_wmi realtek wmi_bmof ghash_clmulni_intel mdio_devres drm_ttm_helper aesni_intel snd_timer libphy ttm crypto_simd ccp snd cryptd rfkill rapl parport_pc gpu_sched sp5100_tco k10temp soundcore rng_core i2c_piix4 parport pcspkr wmi gpio_amdpt pinctrl_amd mac_hid gpio_generic acpi_cpufreq squashfs uinput dm_multipath loop dm_mod fuse crypto_user ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2
[ 284.363598] xhci_pci crc32c_intel xhci_pci_renesas
[ 284.363640] CR2: 00000000000000f8
[ 284.363645] ---[ end trace 1ef27147e3d61ac0 ]---
[ 284.363649] RIP: 0010:ttm_eu_fence_buffer_objects+0x82/0xb0 [ttm]
[ 284.363665] Code: 8d b8 10 08 00 00 e8 6d 44 4a d0 48 8b bb f8 00 00 00 e8 21 f2 49 d0 48 8b 6d 00 4c 39 e5 74 1c 48 8b 5d 10 8b 45 18 4c 89 ee <48> 8b bb f8 00 00 00 85 c0 75 9a e8 0e c0 0d d0 eb 98 5b 5d 41 5c
[ 284.363676] RSP: 0018:ffffb66800eafab8 EFLAGS: 00010283
[ 284.363682] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[ 284.363687] RDX: 0000000000000000 RSI: ffff98325903cf40 RDI: 0000000000000000
[ 284.363692] RBP: ffffb66800cafc30 R08: 0000000000000000 R09: 0000000000000000
[ 284.363698] R10: 0000000000000000 R11: 0000000000000000 R12: ffffb66800eafbf0
[ 284.363703] R13: ffff98325903cf40 R14: 0000000000000000 R15: ffffb66800eafb38
[ 284.363708] FS: 00007f7af2cc3a80(0000) GS:ffff98354ecc0000(0000) knlGS:0000000000000000
[ 284.363716] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 284.363721] CR2: 00000000000000f8 CR3: 0000000140144000 CR4: 00000000003506e0
[ 318.715342] kauditd_printk_skb: 4 callbacks suppressed
[ 318.715347] audit: type=1334 audit(1654868466.983:152): prog-id=32 op=LOAD
[ 318.715612] audit: type=1334 audit(1654868466.983:153): prog-id=33 op=LOAD
[ 318.715779] audit: type=1334 audit(1654868466.983:154): prog-id=34 op=LOAD
[ 318.764815] audit: type=1130 audit(1654868467.033:155): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-timedated comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[ 348.798117] audit: type=1131 audit(1654868497.066:156): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-timedated comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[ 348.860532] audit: type=1334 audit(1654868497.130:157): prog-id=0 op=UNLOAD
[ 348.860551] audit: type=1334 audit(1654868497.130:158): prog-id=0 op=UNLOAD
[ 348.860559] audit: type=1334 audit(1654868497.130:159): prog-id=0 op=UNLOAD
[ 904.113616] audit: type=1130 audit(1654869052.383:160): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-tmpfiles-clean comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[ 904.113640] audit: type=1131 audit(1654869052.383:161): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-tmpfiles-clean comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
A git bisect
points to e68efb27 as the first problematic commit in the amd-staging-drm-next branch.
Beyond failing the test, I usually can't even reboot the machine, locked in "failed to unmount /home" loops.
Hardware description:
- CPU: AMD Ryzen 5 1600 Six-Core Processor
- GPU: 07:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Radeon RX 470/480/570/570X/580/580X/590] [1002:67df] (rev e7)
- System Memory: 16GB
- Display(s): AOC G2260VWQ6
- Type of Display Connection: HDMI
System information:
- Distro name and Version: IGT's container registry.freedesktop.org/drm/igt-gpu-tools/igt:master
- Kernel version: 5.16
- Custom kernel: Self-compiled Kernel from e68efb27 onwards (amd-staging-drm-next) with two patches to enable compilation using GCC12: 82880283 and 52a9dab6
- AMD official driver version: N/A
How to reproduce the issue:
docker run -v $HOME/results:/tmp/results --privileged registry.freedesktop.org/drm/igt-gpu-tools/igt:master igt_runner -s -t "amdgpu/amd_cs_nop" -o /tmp/results/
Attached files:
Edited by Tales Aparecida