general protection fault in mutex_lock / drm_mode_object_unregister when unloading amdgpu
Submitted by Vladimir Panteleev
Assigned to Default DRI bug account
Link to original bug (#106993)
Description
I'm trying to unload amdgpu so that I can pass it to a VM using VFIO. After stopping Xorg, unbinding framebuffer etc. amdgpu module shows 0 users, however "modprobe -r amdgpu" hangs, and this appears in dmesg:
[ 992.692232] general protection fault: 0000 [#1 (closed)] PREEMPT SMP PTI
[ 992.692237] Modules linked in: dm_mod ccm fuse xt_nat vhost_net vhost tap xt_CHECKSUM iptable_mangle xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp devlink ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter bnep uinput it87 hwmon_vid snd_hda_codec_hdmi sit tunnel4 ip_tunnel nls_iso8859_1 8021q mrp nls_cp437 vfat fat ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 intel_rapl nf_nat_ipv4 nf_nat nf_conntrack x86_pkg_temp_thermal intel_powerclamp snd_hda_codec_realtek coretemp amdkfd snd_hda_codec_generic kvm_intel iTCO_wdt iTCO_vendor_support mxm_wmi amd_iommu_v2 amdgpu(-) arc4 ath9k kvm ath9k_common ath9k_hw irqbypass chash crct10dif_pclmul crc32_pclmul gpu_sched ath i2c_algo_bit ghash_clmulni_intel pcbc aesni_intel ttm mac80211 aes_x86_64 drm_kms_helper
[ 992.692279] crypto_simd cryptd ath3k glue_helper intel_cstate btusb drm btrtl intel_uncore btbcm snd_hda_intel btintel snd_usb_audio intel_rapl_perf bluetooth snd_hda_codec uvcvideo cfg80211 snd_usbmidi_lib snd_hda_core snd_rawmidi videobuf2_vmalloc videobuf2_memops snd_hwdep snd_seq_device videobuf2_v4l2 i2c_i801 agpgart snd_pcm syscopyarea videobuf2_common snd_timer ecdh_generic sysfillrect mei_me pl2303 input_leds crc16 snd r8169 rfkill sysimgblt videodev usbserial mousedev joydev lpc_ich mii e1000e fb_sys_fops mei soundcore led_class ioatdma media dca shpchp rtc_cmos wmi bridge evdev mac_hid stp llc tun sg crypto_user ip_tables x_tables btrfs libcrc32c crc32c_generic xor zstd_decompress zstd_compress xxhash raid6_pq sd_mod sr_mod cdrom hid_generic usbhid hid isci ahci libsas libahci scsi_transport_sas
[ 992.692325] xhci_pci ehci_pci crc32c_intel xhci_hcd ehci_hcd libata usbcore usb_common scsi_mod
[ 992.692333] CPU: 5 PID: 113 Comm: kworker/5:1 Tainted: G W 4.17.2-1-ARCH #1 (closed)
[ 992.692335] Hardware name: Gigabyte Technology Co., Ltd. To be filled by O.E.M./X79S-UP5, BIOS F5f 03/19/2014
[ 992.692351] Workqueue: events drm_connector_free_work_fn [drm]
[ 992.692357] RIP: 0010:mutex_lock+0x10/0x20
[ 992.692358] RSP: 0018:ffffaadf46edfe18 EFLAGS: 00010246
[ 992.692361] RAX: 0000000000000000 RBX: c5f816ba155656e8 RCX: 000000010020001b
[ 992.692363] RDX: ffff960e76c09e00 RSI: ffff960e6f2ad818 RDI: c5f816ba155656e8
[ 992.692365] RBP: c5f816ba15565490 R08: 0000000000000001 R09: ffffffffc0fffa22
[ 992.692367] R10: fffffbf4bfbcab00 R11: 000000000000001b R12: ffff960e6f2ad818
[ 992.692369] R13: ffff960e62c192e0 R14: 0ffff960e7f36750 R15: ffff960e76c7b000
[ 992.692372] FS: 0000000000000000(0000) GS:ffff960e7f340000(0000) knlGS:0000000000000000
[ 992.692374] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 992.692376] CR2: 00007ffc558036f8 CR3: 000000044700a005 CR4: 00000000001626e0
[ 992.692378] Call Trace:
[ 992.692389] drm_mode_object_unregister+0x1e/0x60 [drm]
[ 992.692398] drm_encoder_cleanup+0x35/0xa0 [drm]
[ 992.692442] dm_dp_mst_connector_destroy+0x35/0x50 [amdgpu]
[ 992.692453] drm_connector_free_work_fn+0x72/0x90 [drm]
[ 992.692457] process_one_work+0x1d1/0x3b0
[ 992.692460] worker_thread+0x2b/0x3d0
[ 992.692462] ? process_one_work+0x3b0/0x3b0
[ 992.692465] kthread+0x112/0x130
[ 992.692467] ? kthread_flush_work_fn+0x10/0x10
[ 992.692470] ret_from_fork+0x35/0x40
[ 992.692472] Code: 37 6e 93 ff 0f 1f 80 00 00 00 00 0f 1f 44 00 00 be 02 00 00 00 e9 c1 fa ff ff 90 0f 1f 44 00 00 65 48 8b 14 25 00 5c 01 00 31 c0 <f0>
48 0f b1 17 48 85 c0 74 02 eb d4 c3 0f 1f 00 0f 1f 44 00 00
[ 992.692497] RIP: mutex_lock+0x10/0x20 RSP: ffffaadf46edfe18
[ 992.692500] ---[ end trace 841909ebd0d56ac2 ]---
This is on Arch Linux, kernel:
Linux home.thecybershadow.net 4.17.2-1-ARCH #1 (closed) SMP PREEMPT Sat Jun 16 11:08:59 UTC 2018 x86_64 GNU/Linux
Using 1 Radeon RX Vega 64 driving 2 monitors, Philips 240BW and Asus PQ321Q.
Full dmesg:
https://dump.thecybershadow.net/27386389f800db8f44b7fe7af422f926/13%3A30%3A08-stdin.txt