nouveau preventing shutdown after suspend-resume
@jprvita
Submitted by João Paulo Rechi Vita Assigned to Nouveau Project
Description
On a Asus X756UQK laptop with nvidia + intel graphics, after a suspend-resume cycle the machine hangs on shutdown, requiring a forced power off.
This problem is present on nouveau/linux-4.11 branch tip (de9b3ec13dfc drm/nouveau/tmr: provide backtrace when a timeout is hit), Linus' v4.10-rc8 tag, and was first seen on a 4.8 kernel. On this 4.8 kernel, after resuming I saw the following messages on the kernel log once:
[ 186.117539] nouveau 0000:01:00.0: DRM: evicting buffers...
[ 186.118105] nouveau 0000:01:00.0: DRM: waiting for kernel channels to go idle...
[ 201.139049] nouveau 0000:01:00.0: DRM: failed to idle channel 0 [DRM]
[ 201.139688] ------------[ cut here ]------------
[ 201.140297] WARNING: CPU: 0 PID: 1230 at /usr/src/packages/BUILD/linux-4.8.0/drivers/pci/pci.c:1616 pci_disable_device+0x99/0xb0
[ 201.140970] nouveau 0000:01:00.0: disabling already-disabled device
[ 201.140984] Modules linked in:
[ 201.141608] ccm arc4 rfcomm joydev cmac bnep intel_rapl x86_pkg_temp_thermal coretemp i2c_designware_platform i2c_designware_core kvm_intel asus_nb_wmi asus_wmi sparse_keymap snd_hda_codec_hdmi snd_hda_codec_conexant snd_soc_skl snd_hda_codec_generic snd_soc_skl_ipc snd_soc_sst_ipc kvm ath10k_pci snd_soc_sst_dsp snd_hda_ext_core snd_soc_sst_match ath10k_core snd_soc_core irqbypass crct10dif_pclmul snd_compress crc32_pclmul ac97_bus ghash_clmulni_intel snd_pcm_dmaengine ath mac80211 snd_hda_intel aesni_intel snd_hda_codec aes_x86_64 snd_hda_core lrw glue_helper uvcvideo snd_hwdep ablk_helper videobuf2_vmalloc cryptd videobuf2_memops snd_pcm videobuf2_v4l2 cfg80211 videobuf2_core videodev snd_timer media input_leds snd r8169 soundcore mii btusb btrtl shpchp processor_thermal_device mei_me idma64 mei
[ 201.143087] intel_pch_thermal
[ 201.143087] virt_dma
[ 201.143087] intel_lpss_pci
[ 201.143088] intel_soc_dts_iosf
[ 201.143088] hci_uart
[ 201.143089] elan_i2c
[ 201.143089] btbcm
[ 201.143089] btqca
[ 201.143090] btintel
[ 201.143090] bluetooth
[ 201.143090] int3403_thermal
[ 201.143091] int340x_thermal_zone
[ 201.143091] acpi_als
[ 201.143091] kfifo_buf
[ 201.143092] int3400_thermal
[ 201.143092] acpi_thermal_rel
[ 201.143093] industrialio
[ 201.143093] intel_lpss_acpi
[ 201.143093] acpi_pad
[ 201.143094] tpm_crb
[ 201.143094] intel_lpss
[ 201.143094] fjes
[ 201.143095] mac_hid
[ 201.143095] asus_wireless
[ 201.143095] nouveau
[ 201.143096] i915
[ 201.143096] mxm_wmi
[ 201.143096] i2c_algo_bit
[ 201.143097] drm_kms_helper
[ 201.143097] syscopyarea
[ 201.143098] ttm
[ 201.143098] sysfillrect
[ 201.143098] serio_raw
[ 201.143099] sysimgblt
[ 201.143099] fb_sys_fops
[ 201.143100] drm
[ 201.143100] ahci
[ 201.143100] libahci
[ 201.143101] i2c_hid
[ 201.143101] hid
[ 201.143101] video
[ 201.143102] wmi
[ 201.143104] CPU: 0 PID: 1230 Comm: kworker/0:6 Not tainted 4.8.0-32-generic #34+dev155.82734c4beos3.1.2-Endless
[ 201.143104] Hardware name: ASUSTeK COMPUTER INC. X756UQK/X756UQK, BIOS X756UQK.201 07/01/2016
[ 201.143107] Workqueue: pm pm_runtime_work
[ 201.143110] 0000000000000286 000000006307316f ffff953a9d933c08 ffffffff9e031233
[ 201.143111] ffff953a9d933c58 0000000000000000 ffff953a9d933c48 ffffffff9dc832f1
[ 201.143112] 0000065000000000 ffff953a9ff44000 ffff953a9feeeca0 ffff953a997b1800
[ 201.143113] Call Trace:
[ 201.143116] [<ffffffff9e031233>
] dump_stack+0x63/0x90
[ 201.143118] [<ffffffff9dc832f1>
] __warn+0xd1/0xf0
[ 201.143120] [<ffffffff9dc8336f>
] warn_slowpath_fmt+0x5f/0x80
[ 201.143122] [<ffffffff9e0924b4>
] ? pci_save_vc_state+0x34/0xe0
[ 201.143124] [<ffffffff9e087b99>
] pci_disable_device+0x99/0xb0
[ 201.143152] [<ffffffffc06d63d9>
] nouveau_pmops_runtime_suspend+0x69/0xe0 [nouveau]
[ 201.143153] [<ffffffff9e08a03b>
] pci_pm_runtime_suspend+0x5b/0x180
[ 201.143154] [<ffffffff9e1abf63>
] _rpm_callback+0x33/0x70
[ 201.143155] [<ffffffff9e1abfc4>
] rpm_callback+0x24/0x80
[ 201.143156] [<ffffffff9e089fe0>
] ? pci_pm_runtime_resume+0xa0/0xa0
[ 201.143157] [<ffffffff9e1ac2dd>
] rpm_suspend+0x12d/0x650
[ 201.143158] [<ffffffff9e1adc48>
] pm_runtime_work+0x78/0xa0
[ 201.143160] [<ffffffff9dc9db16>
] process_one_work+0x156/0x420
[ 201.143161] [<ffffffff9dc9e62e>
] worker_thread+0x4e/0x4a0
[ 201.143162] [<ffffffff9dc9e5e0>
] ? rescuer_thread+0x380/0x380
[ 201.143163] [<ffffffff9dc9e5e0>
] ? rescuer_thread+0x380/0x380
[ 201.143165] [<ffffffff9dca3b38>
] kthread+0xd8/0xf0
[ 201.143167] [<ffffffff9e49f3df>
] ret_from_fork+0x1f/0x40
[ 201.143168] [<ffffffff9dca3a60>
] ? kthread_park+0x60/0x60
[ 201.143169] ---[ end trace db73394a87e603e4 ]---
Disabling runtime pm (nouveau.runpm=0) the machine is able to shutdown on all those kernel versions, but with a delay of ~50s, and the following messages on the log:
nouveau 0000:01:00.0: Xorg[691]: failed to idle channel 2 [Xorg[691]]
nouveau 0000:01:00.0: Xorg[691]: failed to idle channel 2 [Xorg[691]]
lspci shows the card as:
01:00.0 3D controller: NVIDIA Corporation Device 179c (rev a2)
And according to nouveau logs, this card supports the Optimus technology:
[ 0.863470] pci 0000:01:00.0: optimus capabilities: enabled, status dynamic power, hda bios codec supported
[ 0.863472] VGA switcheroo: detected Optimus DSM method _SB.PCI0.RP01.PEGP handle
[ 0.863473] nouveau: detected PR support, will not use DSM
[ 0.863494] nouveau 0000:01:00.0: enabling device (0006 -> 0007)
[ 0.863691] nouveau 0000:01:00.0: NVIDIA GM107 (1171c0a2)