Crash on amdgpu_sync_get_fence
Submitted by hig..@..mx.net
Assigned to Default DRI bug account
Link to original bug (#104299)
Description
During the past week i got amdgpu 2 crashes, both with this stack:
Dec 17 02:54:42 Couracado kernel: [69955.112339] Oops: 0000 [#1 (closed)] SMP
Dec 17 02:54:42 Couracado kernel: [69955.138598] Modules linked in: uinput snd_usb_audio snd_usbmidi_lib snd_rawmidi f71882fg ipt_ECN snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss nf_conntrack_ipv6 nf_defrag_ipv6 ip6t_REJECT nf_reject_ipv6 nf_log_ipv6 ip6table_mangle ip6table_filter ip6_tables xt_DSCP nf_nat_irc nf_nat nf_conntrack_irc nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack nf_log_ipv4 nf_log_common xt_LOG xt_limit ipt_REJECT nf_reject_ipv4 xt_tcpudp iptable_mangle iptable_filter ip_tables x_tables bridge stp llc ipv6 nls_iso8859_1 nls_cp437 vfat fat reiserfs sch_fq_codel pcspkr fuse joydev hid_generic snd_hda_codec_hdmi usbhid hid eeepc_wmi tuner_simple tuner_types tea5767 tuner tda7432 snd_hda_codec_realtek tvaudio snd_hda_codec_generic msp3400 snd_hda_intel snd_hda_codec
Dec 17 02:54:42 Couracado kernel: [69955.735663] asus_wmi snd_hwdep sparse_keymap bttv tea575x snd_hda_core i2c_dev rfkill wmi_bmof tveeprom crct10dif_pclmul snd_pcm videobuf_dma_sg videobuf_core amdkfd crc32_pclmul rc_core evdev efi_pstore crc32c_intel r8169 v4l2_common ghash_clmulni_intel amd_iommu_v2 serio_raw efivars fam15h_power k10temp snd_timer mii ohci_pci videodev i2c_piix4 snd amdgpu ehci_pci soundcore ohci_hcd ehci_hcd mfd_core parport_pc hwmon xhci_pci ttm parport wmi xhci_hcd video shpchp button acpi_cpufreq loop
Dec 17 02:54:42 Couracado kernel: [69956.099719] CPU: 1 PID: 814 Comm: gfx Not tainted 4.14.6-slack #6 (closed)
Dec 17 02:54:42 Couracado kernel: [69956.150725] Hardware name: System manufacturer System Product Name/A88X-PLUS, BIOS 3003 03/10/2016
Dec 17 02:54:42 Couracado kernel: [69956.225762] task: ffff884c3d508100 task.stack: ffffb665439b0000
Dec 17 02:54:42 Couracado kernel: [69956.275368] RIP: 0010:amdgpu_sync_get_fence+0x91/0xe0 [amdgpu]
Dec 17 02:54:42 Couracado kernel: [69956.324197] RSP: 0018:ffffb665439b3e20 EFLAGS: 00010246
Dec 17 02:54:42 Couracado kernel: [69956.367931] RAX: 00000000002ae450 RBX: ffff884ab449db60 RCX: 0000000000000000
Dec 17 02:54:42 Couracado kernel: [69956.427677] RDX: 0000000000000064 RSI: ffff884b534e8540 RDI: ffff884c46000e00
Dec 17 02:54:42 Couracado kernel: [69956.487426] RBP: ffffb665439b3e40 R08: 0000000000000008 R09: 0000000000000010
Dec 17 02:54:42 Couracado kernel: [69956.547172] R10: 0000000000000255 R11: 000000000000019f R12: 0000000000000000
Dec 17 02:54:42 Couracado kernel: [69956.606922] R13: ffff884767dbc900 R14: ffff884767dbc968 R15: ffff8848d44b8bd8
Dec 17 02:54:42 Couracado kernel: [69956.666669] FS: 0000000000000000(0000) GS:ffff884c5ec80000(0000) knlGS:0000000000000000
Dec 17 02:54:42 Couracado kernel: [69956.734426] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Dec 17 02:54:42 Couracado kernel: [69956.782525] CR2: 00000000002ae468 CR3: 000000011da6a000 CR4: 00000000000406e0
Dec 17 02:54:42 Couracado kernel: [69956.842274] Call Trace:
Dec 17 02:54:42 Couracado kernel: [69956.862764] amdgpu_job_dependency+0x93/0x100 [amdgpu]
Dec 17 02:54:42 Couracado kernel: [69956.905816] amd_sched_main+0xb5/0x450 [amdgpu]
Dec 17 02:54:42 Couracado kernel: [69956.943730] ? wait_woken+0x80/0x80
Dec 17 02:54:42 Couracado kernel: [69956.972902] kthread+0x125/0x140
Dec 17 02:54:42 Couracado kernel: [69956.999935] ? amd_sched_process_job+0xc0/0xc0 [amdgpu]
Dec 17 02:54:42 Couracado kernel: [69957.043674] ? kthread_create_on_node+0x70/0x70
Dec 17 02:54:42 Couracado kernel: [69957.081583] ret_from_fork+0x22/0x30
Dec 17 02:54:42 Couracado kernel: [69957.111479] Code: 89 44 24 08 48 c7 06 00 00 00 00 48 c7 46 08 00 00 00 00 48 8b 3d d8 47 15 00 e8 ab 94 d3 da 48 8b 43 48 a8 01 75 9b 48 8b 43 08 <48>
8b 40 18 48 85 c0 74 09 48 89 df ff d0 84 c0 75 0c 48 89 d8
Dec 17 02:54:42 Couracado kernel: [69957.330761] CR2: 00000000002ae468
Dec 17 02:54:42 Couracado kernel: [69957.358479] ---[ end trace da8374d3133f4c24 ]---
Dec 17 02:54:42 Couracado kernel: [69957.397138] sched: RT throttling activated
It is rare, so hard to reproduce, but as amdgpu have been stable for me in the last 6 months, i would say it's something with the latest kernel or mesa code.
i'm using kernel 4.14.6, drm 2.4.88, mesa 17.3.0, llvm 5.0.0
thanks