nouveau hangs video with TU116 - regression in kernel 5.3
Submitted by Marcin Zajaczkowski
Assigned to Nouveau Project
Link to original bug (#112239)
Description
My GeForce GTX 1660 Ti mobile (NV168/TU116) in Hyperbook NH5/Clevo NH55RCQ worked "fine" with some applied workarounds with kernel 5.2 (https://bugs.freedesktop.org/show_bug.cgi?id=110830#c14), however, with upgrade to 5.3 it started to hang video on the NVidia card state switch. In fact, I don't use it to render the output (is DynOff by default), but I cannot disable it in BIOS and when I open/close a laptop lid it is temporarily waken up to get back to sleep after a few seconds. It works that way in 5.2, but in 5.3 it "hangs video" on the consequtive switch (occasionally also during the first X/gdm setup).
The key related errors in the system log:
kernel: nouveau 0000:01:00.0: fifo: SCHED_ERROR 08 []
(a lot of)
> kernel: nouveau 0000:01:00.0: fifo: fault 01 [VIRT_WRITE] at 0000000000002000 engine c0 [BAR2] client 08 [HUB/HOST_CPU_NB] reason 02 [PTE] on channel -1 [017fedf000 unknown]
(every few seconds)
> kernel: nouveau 0000:01:00.0: DRM: failed to idle channel 0 [DRM]
> kernel: nouveau 0000:01:00.0: i2c: aux 0007: begin idle timeout ffffffff
> kernel: nouveau 0000:01:00.0: tmr: stalled at ffffffffffffffff
> kernel: ------------[ cut here ]------------
> kernel: nouveau 0000:01:00.0: timeout
> kernel: WARNING: CPU: 10 PID: 384 at drivers/gpu/drm/nouveau/nvkm/subdev/bar/g84.c:35 g84_bar_flush+0xcf/> 0xe0 [nouveau]
(and the end)
On boot (here with self rebuilt kernel-core-5.4.0-0.rc6.git0.1.fc30.x86_64 on Fedora 30, but the errors are similar with 5.3) I see:
> Nov 1000:26:12 foobar kernel: Linux version 5.4.0-0.rc6.git0.1.fc30.x86_64 (me@foobar) (gcc version 9.2.1 20190827 (Red Hat 9.2.1-1) (GCC)) #1 SMP Sat Nov 9 18:47:45 CET 2019
...
> Nov 1000:26:12 foobar kernel: fb0: switching to inteldrmfb from EFI VGA
> Nov 1000:26:12 foobar kernel: Console: switching to colour dummy device 80x25
> Nov 1000:26:12 foobar kernel: i915 0000:00:02.0: vgaarb: deactivate vga console
> Nov 1000:26:12 foobar kernel: [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
> Nov 1000:26:12 foobar kernel: [drm] Driver supports precise vblank timestamp query.
> Nov 1000:26:12 foobar kernel: i915 0000:00:02.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=io+mem
> Nov 1000:26:12 foobar kernel: [drm] Finished loading DMC firmware i915/kbl_dmc_ver1_04.bin (v1.4)
> Nov 1000:26:12 foobar kernel: MXM: GUID detected in BIOS
> Nov 1000:26:12 foobar kernel: ACPI Warning: \_SB.PCI0.GFX0._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20190816/nsarguments-59)
> Nov 1000:26:12 foobar kernel: ACPI Warning: \_SB.PCI0.PEG0.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20190816/nsarguments-59)
> Nov 1000:26:12 foobar kernel: pci 0000:01:00.0: optimus capabilities: enabled, status dynamic power, hda bios codec supported
> Nov 1000:26:12 foobar kernel: VGA switcheroo: detected Optimus DSM method \_SB_.PCI0.PEG0.PEGP handle
> Nov 1000:26:12 foobar kernel: nouveau: detected PR support, will not use DSM
> Nov 1000:26:12 foobar kernel: nouveau 0000:01:00.0: enabling device (0106 -> 0107)
> Nov 1000:26:12 foobar kernel: nouveau 0000:01:00.0: NVIDIA TU116 (168000a1)
...
> Nov 1000:26:13 foobar kernel: [drm] Initialized i915 1.6.0 20190822 for 0000:00:02.0 on minor 0
> Nov 1000:26:13 foobar kernel: logitech-djreceiver 0003:046D:C52F.0002: hiddev96,hidraw1: USB HID v1.11 Device [Logitech USB Receiver] on usb-0000:00:14.0-1/input1
> Nov 1000:26:13 foobar kernel: logitech-djreceiver 0003:046D:C52F.0002: device of type QUAD or eQUAD (0x03) connected on slot 1
> Nov 1000:26:13 foobar kernel: input: Logitech Wireless Mouse PID:101f Mouse as /devices/pci0000:00/0000:00:14.0/usb1/1-1/1-1:1.1/0003:046D:C52F.0002/0003:046D:101F.0005/input/input17
> Nov 1000:26:13 foobar kernel: input: Logitech Wireless Mouse PID:101f Consumer Control as /devices/pci0000:00/0000:00:14.0/usb1/1-1/1-1:1.1/0003:046D:C52F.0002/0003:046D:101F.0005/input/input18
> Nov 1000:26:13 foobar kernel: hid-generic 0003:046D:101F.0005: input,hidraw4: USB HID v1.11 Mouse [Logitech Wireless Mouse PID:101f] on usb-0000:00:14.0-1/input1:1
> Nov 1000:26:13 foobar kernel: psmouse serio2: synaptics: queried max coordinates: x [..5658], y [..4722]
> Nov 1000:26:13 foobar kernel: ACPI: Video Device [PEGP] (multi-head: no rom: yes post: no)
> Nov 1000:26:13 foobar kernel: nouveau 0000:01:00.0: bios: version 90.16.26.00.11
> Nov 1000:26:13 foobar kernel: input: Video Bus as /devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/device:00/LNXVIDEO:00/input/input22
> Nov 1000:26:13 foobar kernel: ACPI: Video Device [GFX0] (multi-head: yes rom: no post: no)
> Nov 1000:26:13 foobar kernel: input: Video Bus as /devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/LNXVIDEO:01/input/input23
> Nov 1000:26:13 foobar kernel: nouveau 0000:01:00.0: fb: 6144 MiB GDDR6
> Nov 1000:26:13 foobar kernel: psmouse serio2: synaptics: queried min coordinates: x [1284..], y [1130..]
> Nov 1000:26:13 foobar kernel: psmouse serio2: synaptics: Your touchpad (PNP: PNP0f13) says it can support a different bus. If i2c-hid and hid-rmi are not used, you might want to try setting psmouse.synaptics_intertouch to 1 and report t>
> Nov 1000:26:13 foobar kernel: vga_switcheroo: enabled
> Nov 1000:26:13 foobar kernel: [TTM] Zone kernel: Available graphics memory: 8047486 KiB
> Nov 1000:26:13 foobar kernel: [TTM] Zone dma32: Available graphics memory: 2097152 KiB
> Nov 1000:26:13 foobar kernel: [TTM] Initializing pool allocator
> Nov 1000:26:13 foobar kernel: [TTM] Initializing DMA pool allocator
> Nov 1000:26:13 foobar kernel: nouveau 0000:01:00.0: DRM: VRAM: 6144 MiB
> Nov 1000:26:13 foobar kernel: nouveau 0000:01:00.0: DRM: GART: 536870912 MiB
> Nov 1000:26:13 foobar kernel: nouveau 0000:01:00.0: DRM: BIT table 'A' not found
> Nov 1000:26:13 foobar kernel: nouveau 0000:01:00.0: DRM: BIT table 'L' not found
> Nov 1000:26:13 foobar kernel: nouveau 0000:01:00.0: DRM: TMDS table version 2.0
> Nov 1000:26:13 foobar kernel: nouveau 0000:01:00.0: DRM: DCB version 4.1
> Nov 1000:26:13 foobar kernel: nouveau 0000:01:00.0: DRM: DCB outp 00: 02002f52 00020010
> Nov 1000:26:13 foobar kernel: nouveau 0000:01:00.0: DRM: DCB outp 01: 04814f76 04600010
> Nov 1000:26:13 foobar kernel: nouveau 0000:01:00.0: DRM: DCB outp 02: 04814f72 00020010
> Nov 1000:26:13 foobar kernel: nouveau 0000:01:00.0: DRM: DCB conn 02: 00010261
> Nov 1000:26:13 foobar kernel: nouveau 0000:01:00.0: DRM: DCB conn 04: 01000446
> Nov 1000:26:13 foobar kernel: nouveau 0000:01:00.0: DRM: failed to create kernel channel, -22
> Nov 1000:26:13 foobar kernel: nouveau 0000:01:00.0: DRM: MM: using COPY for buffer copies
> Nov 1000:26:13 foobar kernel: [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
> Nov 1000:26:13 foobar kernel: [drm] Driver supports precise vblank timestamp query.
> Nov 1000:26:13 foobar kernel: [drm] Cannot find any crtc or sizes
> Nov 1000:26:13 foobar kernel: [drm] Initialized nouveau 1.3.1 20120801 for 0000:01:00.0 on minor 1
> Nov 1000:26:13 foobar kernel: [drm] Cannot find any crtc or sizes
> Nov 1000:26:13 foobar kernel: [drm] Cannot find any crtc or sizes
> Nov 1000:26:13 foobar kernel: psmouse serio2: synaptics: Touchpad model: 1, fw: 9.16, id: 0x1e2a1, caps: 0xf00123/0x840300/0x2e800/0x500000, board id: 3429, fw id: 2840755
> Nov 1000:26:13 foobar kernel: input: SynPS/2 Synaptics TouchPad as /devices/platform/i8042/serio2/input/input10
> Nov 1000:26:13 foobar kernel: fbcon: i915drmfb (fb0) is primary device
> Nov 1000:26:13 foobar kernel: Console: switching to colour frame buffer device 240x75
> Nov 1000:26:13 foobar kernel: i915 0000:00:02.0: fb0: i915drmfb frame buffer device
> Nov 1000:26:28 foobar kernel: nouveau 0000:01:00.0: fifo: fault 09 [PHYS_WRITE] at 000000017fef0000 engine c0 [BAR2] client 08 [HUB/HOST_CPU_NB] reason 0d [REGION_VIOLATION] on channel -1 [0000000000 unknown]
> Nov 1000:26:28 foobar kernel: nouveau 0000:01:00.0: fifo: fault 09 [PHYS_WRITE] at 000000017fef0000 engine c0 [BAR2] client 08 [HUB/HOST_CPU_NB] reason 0d [REGION_VIOLATION] on channel -1 [0000000000 unknown]
> Nov 1000:26:28 foobar kernel: snd_hda_intel 0000:01:00.1: Disabling MSI
> Nov 1000:26:28 foobar kernel: snd_hda_intel 0000:01:00.1: Handle vga_switcheroo audio client
> Nov 1000:26:28 foobar kernel: nouveau 0000:01:00.0: fifo: SCHED_ERROR 08 []
> Nov 1000:26:28 foobar kernel: nouveau 0000:01:00.0: fifo: fault 01 [VIRT_WRITE] at 000000000028b000 engine c0 [BAR2] client 08 [HUB/HOST_CPU_NB] reason 04 [UNBOUND_INST_BLOCK] on channel -1 [0000000000 unknown]
> Nov 1000:26:28 foobar kernel: ieee80211 phy0: Selected rate control algorithm 'iwl-mvm-rs'
> Nov 1000:26:28 foobar kernel: thermal thermal_zone3: failed to read out thermal zone (-61)
> Nov 1000:26:28 foobar kernel: usb usb3: root hub lost power or was reset
> Nov 1000:26:28 foobar kernel: usb usb4: root hub lost power or was reset
> Nov 1000:26:28 foobar systemd-udevd[1143]: Using default interface naming scheme 'v240'.
> Nov 1000:26:28 foobar systemd-udevd[1143]: link_config: autonegotiation is unset or enabled, the speed and duplex are not writable.
> Nov 1000:26:28 foobar kernel: iwlwifi 0000:00:14.3 wlp0s20f3: renamed from wlan0
> Nov 1000:26:28 foobar kernel: nouveau 0000:01:00.0: fifo: SCHED_ERROR 08 []
> Nov 1000:26:28 foobar wireless[1394]: setting regulatory domain to PL based on timezone (Europe/Warsaw)
> Nov 1000:26:28 foobar kernel: nouveau 0000:01:00.0: fifo: SCHED_ERROR 08 []
> Nov 1000:26:28 foobar kernel: nouveau 0000:01:00.0: fifo: SCHED_ERROR 08 []
> Nov 1000:26:28 foobar kernel: nouveau 0000:01:00.0: fifo: SCHED_ERROR 08 []
> Nov 1000:26:28 foobar kernel: nouveau 0000:01:00.0: fifo: SCHED_ERROR 08 []
"nouveau 0000:01:00.0: fifo: SCHED_ERROR 08 []" happens number of times per second, "fifo: fault 01 [VIRT_WRITE]" once a few seconds.
On the Nvidia card state switch (here I opened a lid) I observe something like that:
> Nov 1000:27:20 foobar kernel: nouveau 0000:01:00.0: fifo: SCHED_ERROR 08 []
> Nov 1000:27:20 foobar kernel: nouveau 0000:01:00.0: fifo: SCHED_ERROR 08 []
> Nov 1000:27:20 foobar systemd-logind[1719]: Lid opened.
> Nov 1000:27:20 foobar /usr/libexec/gdm-x-session[2441]: (II) modeset(0): Allocate new frame buffer 3840x1200 stride
> Nov 1000:27:20 foobar kernel: nouveau 0000:01:00.0: fifo: SCHED_ERROR 08 []
> Nov 1000:27:20 foobar kernel: nouveau 0000:01:00.0: fifo: SCHED_ERROR 08 []
...
> Nov 1000:27:21 foobar /usr/libexec/gdm-x-session[2441]: (II) modeset(0): EDID vendor "SAM", prod id 1415
> Nov 1000:27:21 foobar /usr/libexec/gdm-x-session[2441]: (II) modeset(0): Using hsync ranges from config file
...
> Nov 1000:27:41 foobar kernel: nouveau 0000:01:00.0: DRM: failed to idle channel 0 [DRM]
> Nov 1000:27:41 foobar kernel: nouveau 0000:01:00.0: i2c: aux 0007: begin idle timeout ffffffff
> Nov 1000:27:44 foobar tracker-store[2907]: OK
> Nov 1000:27:44 foobar systemd[2370]: tracker-store.service: Succeeded.
> Nov 1000:27:47 foobar kernel: nouveau 0000:01:00.0: tmr: stalled at ffffffffffffffff
> Nov 1000:27:47 foobar kernel: ------------[ cut here ]------------
> Nov 1000:27:47 foobar kernel: nouveau 0000:01:00.0: timeout
> Nov 1000:27:47 foobar kernel: WARNING: CPU: 0 PID: 1085 at drivers/gpu/drm/nouveau/nvkm/subdev/bar/g84.c:35 g84_bar_flush+0xcf/0xe0 [nouveau]
> Nov 1000:27:47 foobar kernel: Modules linked in: ccm rfcomm xt_CHECKSUM xt_MASQUERADE tun bridge stp llc nf_conntrack_netbios_ns nf_conntrack_broadcast xt_CT ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrac>
> Nov 1000:27:47 foobar kernel: libarc4 snd_hda_codec videobuf2_v4l2 btintel videobuf2_common iwlwifi kvm snd_hda_core snd_hwdep snd_seq snd_seq_device irqbypass videodev mei_hdcp bluetooth intel_cstate iTCO_wdt mc iTCO_vendor_support sn>
> Nov 1000:27:47 foobar kernel: CPU: 0 PID: 1085 Comm: kworker/0:4 Tainted: G OE 5.4.0-0.rc6.git0.1.fc30.x86_64 #1
> Nov 1000:27:47 foobar kernel: Hardware name: Blue Technology Sp. z o.o. NH5_NH7/NH5_NH7, BIOS 1.07.03TBT 11/16/2018
> Nov 1000:27:47 foobar kernel: Workqueue: pm pm_runtime_work
> Nov 1000:27:47 foobar kernel: RIP: 0010:g84_bar_flush+0xcf/0xe0 [nouveau]
> Nov 1000:27:47 foobar kernel: Code: 8b 40 10 48 8b 78 10 4c 8b 6f 50 4d 85 ed 75 03 4c 8b 2f e8 33 05 f0 e7 4c 89 ea 48 c7 c7 a4 74 92 c0 48 89 c6 e8 3f b4 96 e7 `<0f>` 0b eb a7 e8 58 b1 96 e7 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00
> Nov 1000:27:47 foobar kernel: RSP: 0018:ffffaa40c06cb640 EFLAGS: 00010086
> Nov 1000:27:47 foobar kernel: RAX: 0000000000000000 RBX: ffff95f1d47dfc00 RCX: 0000000000000006
> Nov 1000:27:47 foobar kernel: RDX: 0000000000000007 RSI: 0000000000000082 RDI: ffff95f1e0217900
> Nov 1000:27:47 foobar kernel: RBP: ffff95f1dd6e6748 R08: 0000000000000001 R09: 00000000000016f2
> Nov 1000:27:47 foobar kernel: R10: 000000000000cc44 R11: 0000000000000003 R12: 0000000000000246
> Nov 1000:27:47 foobar kernel: R13: ffff95f1dcd96050 R14: 0000000000000000 R15: ffff95f18c17a0c0
> Nov 1000:27:47 foobar kernel: FS: 0000000000000000(0000) GS:ffff95f1e0200000(0000) knlGS:0000000000000000
> Nov 1000:27:47 foobar kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> Nov 1000:27:47 foobar kernel: CR2: 00007f792a718e20 CR3: 000000040a60a003 CR4: 00000000003606f0
> Nov 1000:27:47 foobar kernel: Call Trace:
> Nov 1000:27:47 foobar kernel: nv50_instobj_release+0x2f/0xc0 [nouveau]
> Nov 1000:27:47 foobar kernel: nvkm_vmm_iter.constprop.0+0x2bc/0x810 [nouveau]
> Nov 1000:27:47 foobar kernel: ? gp100_vmm_join+0x20/0x20 [nouveau]
> Nov 1000:27:47 foobar kernel: nvkm_vmm_map+0x136/0x360 [nouveau]
> Nov 1000:27:47 foobar kernel: ? gp100_vmm_join+0x20/0x20 [nouveau]
> Nov 1000:27:47 foobar kernel: nvkm_mem_map_dma+0x56/0x80 [nouveau]
> Nov 1000:27:47 foobar kernel: nvkm_uvmm_mthd+0x66a/0x780 [nouveau]
> Nov 1000:27:47 foobar kernel: nvkm_ioctl+0xde/0x180 [nouveau]
> Nov 1000:27:47 foobar kernel: nvif_object_mthd+0x104/0x130 [nouveau]
> Nov 1000:27:47 foobar kernel: nvif_vmm_map+0x115/0x130 [nouveau]
> Nov 1000:27:47 foobar kernel: nouveau_mem_map+0x8d/0x100 [nouveau]
> Nov 1000:27:47 foobar kernel: nouveau_vma_map+0x44/0x70 [nouveau]
> Nov 1000:27:47 foobar kernel: nouveau_bo_move_ntfy+0xcd/0xe0 [nouveau]
> Nov 1000:27:47 foobar kernel: ttm_bo_handle_move_mem+0xd2/0x5a0 [ttm]
> Nov 1000:27:47 foobar kernel: ttm_bo_evict+0x16f/0x1f0 [ttm]
> Nov 1000:27:47 foobar kernel: ? __drm_legacy_pci_free+0x66/0x90 [drm]
> Nov 1000:27:47 foobar kernel: ttm_mem_evict_first+0x273/0x360 [ttm]
> Nov 1000:27:47 foobar kernel: ttm_bo_force_list_clean+0xa4/0x170 [ttm]
> Nov 1000:27:47 foobar kernel: nouveau_do_suspend+0x80/0x170 [nouveau]
> Nov 1000:27:47 foobar kernel: nouveau_pmops_runtime_suspend+0x40/0xa0 [nouveau]
> Nov 1000:27:47 foobar kernel: pci_pm_runtime_suspend+0x58/0x140
> Nov 1000:27:47 foobar kernel: ? __switch_to_asm+0x40/0x70
> Nov 1000:27:47 foobar kernel: ? pci_pm_thaw_noirq+0xa0/0xa0
> Nov 1000:27:47 foobar kernel: __rpm_callback+0xc1/0x140
> Nov 1000:27:47 foobar kernel: ? pci_pm_thaw_noirq+0xa0/0xa0
> Nov 1000:27:47 foobar kernel: rpm_callback+0x1f/0x70
> Nov 1000:27:47 foobar kernel: rpm_suspend+0x10a/0x5a0
> Nov 1000:27:47 foobar kernel: ? __switch_to_asm+0x34/0x70
> Nov 1000:27:47 foobar kernel: pm_runtime_work+0x86/0x90
> Nov 1000:27:47 foobar kernel: process_one_work+0x1b0/0x350
> Nov 1000:27:47 foobar kernel: worker_thread+0x50/0x3b0
> Nov 1000:27:47 foobar kernel: kthread+0xfb/0x130
> Nov 1000:27:47 foobar kernel: ? process_one_work+0x350/0x350
> Nov 1000:27:47 foobar kernel: ? kthread_park+0x90/0x90
> Nov 1000:27:47 foobar kernel: ret_from_fork+0x35/0x40
> Nov 1000:27:47 foobar kernel: ---[ end trace e70ebf987c8ad925 ]---
> Nov 1000:27:47 foobar kernel: nouveau 0000:01:00.0: tmr: stalled at ffffffffffffffff
> Nov 1000:27:47 foobar kernel: ------------[ cut here ]------------
> Nov 1000:27:47 foobar kernel: nouveau 0000:01:00.0: timeout
> Nov 1000:27:47 foobar kernel: WARNING: CPU: 0 PID: 1085 at drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmmtu102.c:44 tu102_vmm_flush+0x128/0x140 [nouveau]
> Nov 1000:27:47 foobar kernel: Modules linked in: ccm rfcomm xt_CHECKSUM xt_MASQUERADE tun bridge stp llc nf_conntrack_netbios_ns nf_conntrack_broadcast xt_CT ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrac>
> Nov 1000:27:47 foobar kernel: libarc4 snd_hda_codec videobuf2_v4l2 btintel videobuf2_common iwlwifi kvm snd_hda_core snd_hwdep snd_seq snd_seq_device irqbypass videodev mei_hdcp bluetooth intel_cstate iTCO_wdt mc iTCO_vendor_support sn>
> Nov 1000:27:47 foobar kernel: CPU: 0 PID: 1085 Comm: kworker/0:4 Tainted: G W OE 5.4.0-0.rc6.git0.1.fc30.x86_64 #1
> Nov 1000:27:47 foobar kernel: Hardware name: Blue Technology Sp. z o.o. NH5_NH7/NH5_NH7, BIOS 1.07.03TBT 11/16/2018
> Nov 1000:27:47 foobar kernel: Workqueue: pm pm_runtime_work
> Nov 1000:27:47 foobar kernel: RIP: 0010:tu102_vmm_flush+0x128/0x140 [nouveau]
> Nov 1000:27:47 foobar kernel: Code: 8b 40 10 48 8b 78 10 4c 8b 67 50 4d 85 e4 75 03 4c 8b 27 e8 ca 19 eb e7 4c 89 e2 48 c7 c7 dc 8e 92 c0 48 89 c6 e8 d6 c8 91 e7 `<0f>` 0b eb aa e8 ef c5 91 e7 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f
> Nov 1000:27:47 foobar kernel: RSP: 0018:ffffaa40c06cb678 EFLAGS: 00010286
> Nov 1000:27:47 foobar kernel: RAX: 0000000000000000 RBX: ffff95f1d47dfc00 RCX: 0000000000000006
> Nov 1000:27:47 foobar kernel: RDX: 0000000000000007 RSI: 0000000000000092 RDI: ffff95f1e0217900
> Nov 1000:27:47 foobar kernel: RBP: ffff95f1dce73220 R08: 0000000000000001 R09: 000000000000172b
> Nov 1000:27:47 foobar kernel: R10: 000000000000e120 R11: 0000000000000003 R12: ffff95f1dcd96050
> Nov 1000:27:47 foobar kernel: R13: ffff95f1d457f200 R14: 0000000000000000 R15: ffff95f18c17a0c0
> Nov 1000:27:47 foobar kernel: FS: 0000000000000000(0000) GS:ffff95f1e0200000(0000) knlGS:0000000000000000
> Nov 1000:27:47 foobar kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> Nov 1000:27:47 foobar kernel: CR2: 00007f792a718e20 CR3: 000000040a60a003 CR4: 00000000003606f0
> Nov 1000:27:47 foobar kernel: Call Trace:
> Nov 1000:27:47 foobar kernel: nvkm_vmm_iter.constprop.0+0x34b/0x810 [nouveau]
> Nov 1000:27:47 foobar kernel: ? gp100_vmm_join+0x20/0x20 [nouveau]
repeated (the stacktrace) a few times. Attached in a more complete form. The "" are no longer visible.
On the consecutive laptop lid close the video hangs) - the music is still playing, but caps lock doesn't turn on a led on my keyboard). In logs, after a call trace I see "kernel: [TTM] Buffer eviction failed":
> Nov 1000:28:14 foobar systemd-logind[1719]: Lid closed.
...
> Nov 1000:28:15 foobar /usr/libexec/gdm-x-session[2441]: (II) modeset(0): Allocate new frame buffer 1920x1200 stride
> Nov 1000:28:16 foobar /usr/libexec/gdm-x-session[2441]: (II) modeset(0): EDID vendor "CMN", prod id 5608
> Nov 1000:28:16 foobar /usr/libexec/gdm-x-session[2441]: (II) modeset(0): Printing DDC gathered Modelines:
> Nov 1000:28:16 foobar /usr/libexec/gdm-x-session[2441]: (II) modeset(0): Modeline "1920x1080"x0.0 152.84 1920 2000 2054 2250 1080 1086 1094 1132 -hsync -vsync (67.9 kHz eP)
> Nov 1000:28:16 foobar /usr/libexec/gdm-x-session[2441]: (II) modeset(0): EDID vendor "SAM", prod id 1415
...
> Nov 1000:28:18 foobar kernel: nouveau 0000:01:00.0: tmr: stalled at ffffffffffffffff
> Nov 1000:28:18 foobar kernel: ------------[ cut here ]------------
> Nov 1000:28:18 foobar kernel: nouveau 0000:01:00.0: timeout
> Nov 1000:28:18 foobar kernel: WARNING: CPU: 0 PID: 1085 at drivers/gpu/drm/nouveau/nvkm/subdev/bar/g84.c:35 g84_bar_flush+0xcf/0xe0 [nouveau]
> Nov 1000:28:18 foobar kernel: Modules linked in: ccm rfcomm xt_CHECKSUM xt_MASQUERADE tun bridge stp llc nf_conntrack_netbios_ns nf_conntrack_broadcast xt_CT ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrac>
> Nov 1000:28:18 foobar kernel: libarc4 snd_hda_codec videobuf2_v4l2 btintel videobuf2_common iwlwifi kvm snd_hda_core snd_hwdep snd_seq snd_seq_device irqbypass videodev mei_hdcp bluetooth intel_cstate iTCO_wdt mc iTCO_vendor_support sn>
> Nov 1000:28:18 foobar kernel: CPU: 0 PID: 1085 Comm: kworker/0:4 Tainted: G W OE 5.4.0-0.rc6.git0.1.fc30.x86_64 #1
> Nov 1000:28:18 foobar kernel: Hardware name: Blue Technology Sp. z o.o. NH5_NH7/NH5_NH7, BIOS 1.07.03TBT 11/16/2018
> Nov 1000:28:18 foobar kernel: Workqueue: pm pm_runtime_work
> Nov 1000:28:18 foobar kernel: RIP: 0010:g84_bar_flush+0xcf/0xe0 [nouveau]
...
> Nov 1000:28:18 foobar kernel: ---[ end trace e70ebf987c8ad92c ]---
> Nov 1000:28:18 foobar kernel: [TTM] Buffer eviction failed
> Nov 1000:28:19 foobar abrt-dump-journal-oops[1695]: abrt-dump-journal-oops: Found oopses: 2
> Nov 1000:28:19 foobar abrt-dump-journal-oops[1695]: abrt-dump-journal-oops: Creating problem directories
> Nov 1000:28:19 foobar abrt-server[10758]: Package 'kernel-core' isn't signed with proper key
> Nov 1000:28:19 foobar abrt-server[10758]: 'post-create' on '/var/spool/abrt/oops-2019-11-10-00:28:19-1695-0' exited with 1
> Nov 1000:28:19 foobar abrt-server[10758]: Deleting problem directory '/var/spool/abrt/oops-2019-11-10-00:28:19-1695-0'
> Nov 1000:28:20 foobar abrt-server[10761]: Package 'kernel-core' isn't signed with proper key
> Nov 1000:28:20 foobar abrt-server[10761]: 'post-create' on '/var/spool/abrt/oops-2019-11-10-00:28:19-1695-1' exited with 1
> Nov 1000:28:20 foobar abrt-server[10761]: Deleting problem directory '/var/spool/abrt/oops-2019-11-10-00:28:19-1695-1'
> Nov 1000:28:21 foobar abrt-dump-journal-oops[1695]: Reported 2 kernel oopses to Abrt
> Nov 1000:28:33 foobar kernel: nouveau 0000:01:00.0: DRM: failed to idle channel 0 [DRM]
> Nov 1000:28:33 foobar kernel: BUG: unable to handle page fault for address: ffffaa41c0386ffc
> Nov 1000:28:33 foobar kernel: #PF: supervisor write access in kernel mode
> Nov 1000:28:33 foobar kernel: #PF: error_code(0x0002) - not-present page
> Nov 1000:28:33 foobar kernel: PGD 45e550067 P4D 45e550067 PUD 0
> Nov 1000:28:33 foobar kernel: Oops: 0002 [#1] SMP PTI
> Nov 1000:28:33 foobar kernel: CPU: 0 PID: 1085 Comm: kworker/0:4 Tainted: G W OE 5.4.0-0.rc6.git0.1.fc30.x86_64 #1
> Nov 1000:28:33 foobar kernel: Hardware name: Blue Technology Sp. z o.o. NH5_NH7/NH5_NH7, BIOS 1.07.03TBT 11/16/2018
> Nov 1000:28:33 foobar kernel: Workqueue: pm pm_runtime_work
> Nov 1000:28:33 foobar kernel: RIP: 0010:evo_wait+0x5a/0x130 [nouveau]
...
> Nov 1000:28:40 foobar gsd-xsettings[2815]: Failed to get current display configuration state: Timeout was reached
> $ lspci | grep VGA
> 00:02.0 VGA compatible controller: Intel Corporation UHD Graphics 630 (Mobile)
> 01:00.0 VGA compatible controller: NVIDIA Corporation TU116M [GeForce GTX 1660 Ti Mobile] (rev a1)
I've seen similar issue, however, in that case it is a regression - it worked fine with kernel 5.2 (tested since RC1 to 5.2.18) and it's broken in 5.3 (tested with 5.3.1 to 5.3.8 and 5.4.0-rc6).
I'm not sure which commit broke it (building the kernel takes some time), but having some candidates I could try to verify before/after it occurs or not.
Btw, I'm looking for potential workarounds (better than sticking to 5.2.18). I don't use NVidia to render the output, so I could blacklist nouveau and use bbswitch to keep NVidia card off. However, it would make testing newer kernel versions somehow harder. Maybe I can disable something in nouveau to keep the card off and still do not suffer from the errors above?