Kernel invalid opcode on unbinding amdgpu
Submitted by nos..@..ta.moe
Assigned to Default DRI bug account
Link to original bug (#100399)
Description
I'm not sure where is the best place to post this report, so let me know if there is a better place than here.
I have a RX480 GPU that I use with amdgpu on linux 4.11.0-rc3+ (compiled with the Ubuntu 4.8.0 lowlatency config), and everything seemingly works fine until I try to unbind amdgpu from the device. This also happened with linux 4.10.0-rc3+
I've reproduced this regardless of whether the amdgpu device is the primary or secondary display device, and whether X is active or not.
Observe:
$ lspci | grep AMD
01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Radeon RX 470/480] (rev c7)
01:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Device aaf0
$ echo 01:00.0 | sudo tee /sys/bus/pci/devices/01:00.0/driver/unbind
Segmentation Fault
At this point, the system becomes unstable and some system calls seems to just hang (not sure which exactly, but sudo and ps a breaks). Trying to shut down the system also hangs.
dmesg output:
[ 86.993436] ------------[ cut here ]------------
[ 86.993439] kernel BUG at drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c:6930!
[ 86.993442] invalid opcode: 0000 [#1 (closed)] PREEMPT SMP
[ 86.993443] Modules linked in: ipt_MASQUERADE nf_nat_masquerade_ipv4 xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp iptable_filter nf_nat_h323 nf_conntrack_h323 nf_nat_pptp nf_nat_proto_gre nf_conntrack_pptp nf_conntrack_proto_gre nf_nat_tftp nf_conntrack_tftp nf_nat_sip nf_conntrack_sip nf_nat_irc nf_conntrack_irc nf_nat_ftp nf_conntrack_ftp iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack libcrc32c ip_tables x_tables bnep bridge stp llc binfmt_misc dm_snapshot dm_bufio nls_iso8859_1 intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd glue_helper cryptd intel_cstate intel_rapl_perf input_leds serio_raw joydev snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi
[ 86.993488] mei_me snd_hda_intel mei snd_hda_codec snd_hda_core intel_pch_thermal snd_hwdep snd_pcm snd_timer snd soundcore hci_uart btbcm btqca btintel bluetooth intel_lpss_acpi intel_lpss shpchp acpi_als acpi_pad mac_hid kfifo_buf tpm_infineon industrialio kvm_intel kvm irqbypass it87 hwmon_vid parport_pc ppdev lp parport autofs4 btrfs xor raid6_pq hid_generic usbhid mxm_wmi amdkfd amd_iommu_v2 i915 amdgpu ttm drm_kms_helper igb e1000e syscopyarea sysfillrect dca psmouse nvme sysimgblt ptp fb_sys_fops nvme_core firewire_ohci pps_core i2c_algo_bit drm ahci firewire_core crc_itu_t libahci wmi video pinctrl_sunrisepoint i2c_hid pinctrl_intel hid fjes
[ 86.993519] CPU: 5 PID: 2955 Comm: tee Not tainted 4.11.0-rc3+ #1 (closed)
[ 86.993521] Hardware name: Gigabyte Technology Co., Ltd. Z170X-UD5/Z170X-UD5-CF, BIOS F4 10/21/2015
[ 86.993523] task: ffff8ee839f4d880 task.stack: ffffacb00624c000
[ 86.993539] RIP: 0010:gfx_v8_0_kiq_set_interrupt_state+0xce/0xe0 [amdgpu]
[ 86.993541] RSP: 0018:ffffacb00624fb68 EFLAGS: 00010046
[ 86.993543] RAX: 0000000000000000 RBX: ffff8ee855f6b2d8 RCX: 0000000000000000
[ 86.993545] RDX: 0000000000000000 RSI: ffff8ee855f6c750 RDI: ffff8ee855f68000
[ 86.993546] RBP: ffffacb00624fba8 R08: 000000000001e640 R09: ffffffffc039bcb9
[ 86.993548] R10: fffff1f06155f200 R11: 0000000000000000 R12: ffff8ee855f68000
[ 86.993550] R13: ffff8ee855f6b548 R14: ffff8ee855f6c750 R15: 0000000000000000
[ 86.993552] FS: 00007f1260269700(0000) GS:ffff8ee881d40000(0000) knlGS:0000000000000000
[ 86.993555] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 86.993556] CR2: 000055651d517908 CR3: 0000000831b78000 CR4: 00000000003406e0
[ 86.993558] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 86.993560] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 86.993562] Call Trace:
[ 86.993572] ? amdgpu_irq_disable_all+0x89/0xe0 [amdgpu]
[ 86.993582] amdgpu_irq_uninstall+0x17/0x20 [amdgpu]
[ 86.993589] drm_irq_uninstall+0x8e/0x170 [drm]
[ 86.993598] amdgpu_irq_fini+0x83/0xc0 [amdgpu]
[ 86.993606] tonga_ih_sw_fini+0x12/0x30 [amdgpu]
[ 86.993613] amdgpu_fini+0x2c5/0x490 [amdgpu]
[ 86.993620] amdgpu_device_fini+0x53/0x160 [amdgpu]
[ 86.993626] amdgpu_driver_unload_kms+0x4f/0xa0 [amdgpu]
[ 86.993632] drm_dev_unregister+0x3c/0xe0 [drm]
[ 86.993637] drm_put_dev+0x36/0x70 [drm]
[ 86.993643] amdgpu_pci_remove+0x15/0x20 [amdgpu]
[ 86.993646] pci_device_remove+0x39/0xc0
[ 86.993649] device_release_driver_internal+0x155/0x210
[ 86.993651] device_release_driver+0x12/0x20
[ 86.993653] unbind_store+0x10d/0x160
[ 86.993655] drv_attr_store+0x25/0x30
[ 86.993657] sysfs_kf_write+0x37/0x40
[ 86.993659] kernfs_fop_write+0x120/0x1a0
[ 86.993662] __vfs_write+0x37/0x160
[ 86.993665] ? apparmor_file_permission+0x1a/0x20
[ 86.993667] ? security_file_permission+0x3b/0xc0
[ 86.993669] vfs_write+0xb8/0x1b0
[ 86.993672] SyS_write+0x55/0xc0
[ 86.993674] entry_SYSCALL_64_fastpath+0x1e/0xad
[ 86.993676] RIP: 0033:0x7f125fd9f6e0
[ 86.993678] RSP: 002b:00007ffe60a95358 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 86.993681] RAX: ffffffffffffffda RBX: 000000000126e090 RCX: 00007f125fd9f6e0
[ 86.993682] RDX: 000000000000000d RSI: 00007ffe60a95400 RDI: 0000000000000003
[ 86.993684] RBP: 0000000000000000 R08: 000000000126e520 R09: 0000000000000000
[ 86.993686] R10: 0000000000000837 R11: 0000000000000246 R12: 0000000000000000
[ 86.993688] R13: 000000000000002d R14: 000000000126f590 R15: 000000000126e090
[ 86.993690] Code: ff 25 ff ff ff df 31 c9 be b4 30 00 00 89 c2 48 89 df e8 86 9a fb ff 31 d2 44 89 e6 48 89 df e8 e9 96 fb ff 25 ff ff ff df eb b6 <0f>
0b 0f 0b 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 55 48 89
[ 86.993716] RIP: gfx_v8_0_kiq_set_interrupt_state+0xce/0xe0 [amdgpu] RSP: ffffacb00624fb68
[ 86.993719] ---[ end trace 36bcf8facd6b3d68 ]---
[ 86.993722] note: tee[2955] exited with preempt_count 1