Due to an influx of spam, we have had to impose restrictions on new accounts. Please see this wiki page for instructions on how to get full permissions. Sorry for the inconvenience.
Admin message
Equinix is shutting down its operations with us on April 30, 2025. They have graciously supported us for almost 5 years, but all good things come to an end.
Given the time frame, it's going to be hard to make a smooth transition of the cluster to somewhere else (TBD). Please expect in the next months some hiccups in the service and probably at least a full week of downtime to transfer gitlab to a different place.
All help is appreciated.
[NVE6] complete system freeze, PGRAPH engine fault on channel 2, SCHED_ERROR [ CTXSW_TIMEOUT ]
Created attachment 120860
dmesg output after crash
The system randomly freezes. I can still move the mouse but can't do anything else. I cannot switch VT, the three-finger salute does not work and system shut down via ACPI does not work either. The only way of recovery is a hard switch off.
The crash seems to occur more frequently, if I play videos (mplayer or Flash), but this is not a requirement.
The crash does not occur, if I use the blob firmware (nouveau.config="NvGrUseFw=1").
The manifestation of that bug was different though, so ... probably unrelated. Wouldn't hurt to update though.
There have also been slight updates to the ctxsw fw in kernel 4.4, so you could try 4.4-rc8 and see if the issues remain. (Also the driver got a substantial refactor/rewrite in kernel 4.3.)
Either way, please check on latest to see if issues persist.
I also reported bug #93630 at the same time. Both problems occur in turns. I do not know if they are related, because the symptoms are completely different, but they both disappear using the blob firmware.
OK, I will test kernel 4.1.13 and 4.3.3. Both are available in the unstable branch of my distro. I cannot test 4.4.x yet. Let's see what I will find out.
Today, I finally managed to give kernel 4.4.0 a try. When I try to use the proprietary firmware I get to following boot error
Jan 20 20:01:44 kernel: nouveau 0000:01:00.0: Direct firmware load for nvidia/gk106/fecs_inst.bin failed with error -2
Jan 20 20:01:44 kernel: nouveau 0000:01:00.0: gr: failed to load fecs_inst
Where do I get this firmware from? Until now (<4.4.0) I used the following kernel parameters
drm/nouveau/gr: use NVIDIA-provided external firmwares
NVIDIA will officially start providing GR firmwares through
linux-firmware for GPUs that require it. Change the GR firmware lookup
function to use these files.
Ok. Let's restate it in my own words to make sure I understood you correctly.
But first, let me say that I already have tried 4.4.0 without proprietary firmware. To this end I omitted nouveau.config="NvGrUseFw=1". I experienced the same crash with freeze while scrolling a LibreOffice Writer document. So bad news the bug is still there.
Now, I wanted to check if 4.4.0 with proprietary firmware works without crashing. For this purpose you want me to rename
After re-compilation I should have a new kernel that uses the proprietary fw and I should be able to check if the crash with freeze is still there or not.
Hi i can confirm the problems with the GTX660 firmware.
I'm using Debian 9.0 Stretch with Kernel 4.4. When using the nouveau firmware i got random freezes. After installing the nvidia driver, the freezes disappeared.
I then found this bug report, an the solution with using the nvidia blob firmware is working like a charm
Kernel 4.4.5, Arch Linux, GeForce GTX 650 Ti Boost (first time using it).
X will freeze almost immediately (typically right after my KDE session has loaded) with the following message:
nouveau 0000:01:00.0: fifo: write fault at 00002a2000 engine 00 [GR] client 0f [GPC1/PROP_0] reason 02 [PTE] on channel 2 [023faf0000 Xorg[666]]
nouveau 0000:01:00.0: fifo: gr engine fault on channel 2, recovering...
If have just booted using the NVIDIA firmware (NvGrUseFW=1), and X is working fine so far. Will perform a few more boots and spend some time on it to confirm the issue is solved.
This seems to be a GR firmware issue, any guidance on how I can help fixing it?
I'm experiencing this bug with kernel 4.4.6 and GK106 (GeForce GTX 645 OEM).
For what it's worth, I've been experiencing this same bug for about two years, ever since kernel 3.12 (I was on kernel 3.10 prior to that and never experienced the bug). After each kernel upgrade I've done, I usually test the built-in firmware to see if the problem is still there. For a while, I thought the bug was gone as of the 4.4 kernel because I hadn't experienced the problem for a long time on 4.4 with the built-in firmware. But recently I upgraded my desktop environment to KDE Plasma 5 and now I've experienced the lockup twice in less than a 24-hour period.
I just switched back to the proprietary firmware (after moving/renaming the binary blobs to match the names that the new version of the driver looks for).
I'm willing to test the built-in firmware more, in order to help debug this longstanding problem. Is there something I can do to help gather more information (traces, enable debug logging, etc)?
I'm experiencing this bug with kernel 4.4.6 and GK106 (GeForce GTX 645 OEM).
For what it's worth, I've been experiencing this same bug for about two
years, ever since kernel 3.12 (I was on kernel 3.10 prior to that and never
experienced the bug). After each kernel upgrade I've done, I usually test
the built-in firmware to see if the problem is still there. For a while, I
thought the bug was gone as of the 4.4 kernel because I hadn't experienced
the problem for a long time on 4.4 with the built-in firmware. But recently
I upgraded my desktop environment to KDE Plasma 5 and now I've experienced
the lockup twice in less than a 24-hour period.
I just switched back to the proprietary firmware (after moving/renaming the
binary blobs to match the names that the new version of the driver looks
for).
I'm willing to test the built-in firmware more, in order to help debug this
longstanding problem. Is there something I can do to help gather more
information (traces, enable debug logging, etc)?
if you are _sure_ that it never happened with 3.10 it would be really good if you git bisect the kernel and see which commit broke it. If you wait like one or two days for a freeze you should be through in a month :/
But maybe there is somebody who might know what it could be if it is really new with 3.12 (or 3.11)
For anyone who experiences random hangs using the nouveau firmware and have spotted "SCHED_ERROR [ CTXSW_TIMEOUT ]" in the logs, please build a kernel with the attached patch and run using the nouveau firmware. Minimum kernel required is 4.6.
I've tested this on one machine of my own and have a strong indication that this improves the situation, but unfortunately it is difficult to distinguish random infrequent hangs from no hangs.
I tried to test both patches on top of 4.6_rc7. I can't tell if this bug is repaired, because I immediately ran into bug #95054 and another new bug I haven't had so far within seconds after I logged into KDE.
After three attempts for each patch I gave up trying. Sorry, I am of no help in this case.
I've been running a 4.4.8 kernel, so had to build a 4.6-rc7 kernel. And, to be sure, I also wanted to first verify that I can reproduce the problem on unpatched 4.6-rc7. It took about 6.5 hours before my system locked up on 4.6-rc7.
Now I'm going to apply the patch (the alternate one from Ben) and see how that goes.
I think I'm getting this same bug with Nvidia GTX 650 Ti on Arch Linux using unpatched released kernel 4.6, xf86-video-nouveau 1.0.12 and mesa 11.2.2
Sometimes random hangs happen which locks up GPU.
Note that nouveau have never really worked for me on this PC with this GPU for like more than a year since I installed Linux there. It always have been getting random hangs like other bug #89912 So I've been using proprietary Nvidia driver which works fine.
Anyway for this bug, I'm using GNOME Shell and it happened while playing Minecraft (with java-8-openjdk)
Kernel log:
kernel: fb: switching to nouveaufb from EFI VGA
kernel: Console: switching to colour dummy device 80x25
kernel: nouveau 0000:01:00.0: NVIDIA GK106 (0e6060a1)
kernel: nouveau 0000:01:00.0: bios: version 80.06.21.00.37
kernel: nouveau 0000:01:00.0: fb: 1024 MiB GDDR5
kernel: [TTM] Zone kernel: Available graphics memory: 12204692 kiB
kernel: [TTM] Zone dma32: Available graphics memory: 2097152 kiB
kernel: [TTM] Initializing pool allocator
kernel: [TTM] Initializing DMA pool allocator
kernel: nouveau 0000:01:00.0: DRM: VRAM: 1024 MiB
kernel: nouveau 0000:01:00.0: DRM: GART: 1048576 MiB
kernel: nouveau 0000:01:00.0: DRM: TMDS table version 2.0
kernel: nouveau 0000:01:00.0: DRM: DCB version 4.0
kernel: nouveau 0000:01:00.0: DRM: DCB outp 00: 01000f02 00020030
kernel: nouveau 0000:01:00.0: DRM: DCB outp 01: 02000f00 00020030
kernel: nouveau 0000:01:00.0: DRM: DCB outp 02: 08011f82 0f420030
kernel: nouveau 0000:01:00.0: DRM: DCB outp 03: 02022f62 0f420010
kernel: nouveau 0000:01:00.0: DRM: DCB conn 00: 00001030
kernel: nouveau 0000:01:00.0: DRM: DCB conn 01: 00002131
kernel: nouveau 0000:01:00.0: DRM: DCB conn 02: 00010263
kernel: [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
kernel: [drm] Driver supports precise vblank timestamp query.
kernel: nouveau 0000:01:00.0: DRM: MM: using COPY for buffer copies
kernel: nouveau 0000:01:00.0: DRM: allocated 2560x1440 fb: 0x60000, bo ffff880601a1c400
kernel: fbcon: nouveaufb (fb0) is primary device
kernel: Console: switching to colour frame buffer device 128x48
kernel: nouveau 0000:01:00.0: fb0: nouveaufb frame buffer device
kernel: [drm] Initialized nouveau 1.3.1 20120801 for 0000:01:00.0 on minor 0
...
kernel: nouveau 0000:01:00.0: fifo: PBDMA0: 00200000 [METHOD] ch 6 [003f7d0000 java[1817]] subc 0 mthd 000c data 00000000
kernel: nouveau 0000:01:00.0: gr: TRAP ch 6 [003f7d0000 java[1817]]
kernel: nouveau 0000:01:00.0: gr: DISPATCH 80000002 [CLASS_SUBCH_MISMATCH]
kernel: nouveau 0000:01:00.0: fifo: PBDMA0: 02000000 [SEMAPHORE] ch 6 [003f7d0000 java[1817]] subc 0 mthd 001c data 3f800000
kernel: nouveau 0000:01:00.0: gr: ILLEGAL_CLASS ch 6 [003f7d0000 java[1817]] subc 0 class 0000 mthd 0100 data 00000000
kernel: nouveau 0000:01:00.0: fifo: PBDMA0: 00200000 [METHOD] ch 6 [003f7d0000 java[1817]] subc 0 mthd 0030 data 20030700
kernel: nouveau 0000:01:00.0: fifo: PBDMA0: 00200000 [METHOD] ch 6 [003f7d0000 java[1817]] subc 0 mthd 0034 data 00001014
kernel: nouveau 0000:01:00.0: fifo: PBDMA0: 00200000 [METHOD] ch 6 [003f7d0000 java[1817]] subc 0 mthd 0038 data 00000000
kernel: nouveau 0000:01:00.0: fifo: PBDMA0: 00200000 [METHOD] ch 6 [003f7d0000 java[1817]] subc 0 mthd 003c data 015c4d10
kernel: nouveau 0000:01:00.0: fifo: PBDMA0: 00200000 [METHOD] ch 6 [003f7d0000 java[1817]] subc 0 mthd 0040 data 200207c0
kernel: nouveau 0000:01:00.0: fifo: PBDMA0: 00200000 [METHOD] ch 6 [003f7d0000 java[1817]] subc 0 mthd 0044 data 00000000
kernel: nouveau 0000:01:00.0: fifo: PBDMA0: 00200000 [METHOD] ch 6 [003f7d0000 java[1817]] subc 0 mthd 0048 data 015d1fff
kernel: nouveau 0000:01:00.0: fifo: PBDMA0: 00400000 [METHODCRC] ch 6 [003f7d0000 java[1817]] subc 0 mthd 007c data a01108e3
kernel: nouveau 0000:01:00.0: fifo: PBDMA0: 00200000 [METHOD] ch 6 [003f7d0000 java[1817]] subc 0 mthd 004c data 80000704
kernel: nouveau 0000:01:00.0: fifo: PBDMA0: 00200000 [METHOD] ch 6 [003f7d0000 java[1817]] subc 0 mthd 00ec data 00000000
kernel: nouveau 0000:01:00.0: fifo: PBDMA0: 00200000 [METHOD] ch 6 [003f7d0000 java[1817]] subc 0 mthd 00f0 data 00000000
kernel: nouveau 0000:01:00.0: fifo: PBDMA0: 00200000 [METHOD] ch 6 [003f7d0000 java[1817]] subc 0 mthd 00f4 data 00000000
kernel: nouveau 0000:01:00.0: fifo: PBDMA0: 00200000 [METHOD] ch 6 [003f7d0000 java[1817]] subc 0 mthd 00f8 data 3f800000
kernel: nouveau 0000:01:00.0: fifo: PBDMA0: 00200000 [METHOD] ch 6 [003f7d0000 java[1817]] subc 0 mthd 00fc data 3f800000
... loads of more like these ...
kernel: nouveau 0000:01:00.0: gr: ILLEGAL_CLASS ch 6 [003f7d0000 java[1817]] subc 0 class 0000 mthd 0100 data 00000000
kernel: nouveau 0000:01:00.0: gr: ILLEGAL_CLASS ch 6 [003f7d0000 java[1817]] subc 0 class 0000 mthd 0104 data 00000000
kernel: nouveau 0000:01:00.0: gr: ILLEGAL_CLASS ch 6 [003f7d0000 java[1817]] subc 0 class 0000 mthd 0108 data 00000000
kernel: nouveau 0000:01:00.0: gr: ILLEGAL_CLASS ch 6 [003f7d0000 java[1817]] subc 0 class 0000 mthd 010c data 00000000
kernel: nouveau 0000:01:00.0: gr: ILLEGAL_CLASS ch 6 [003f7d0000 java[1817]] subc 0 class 0000 mthd 0110 data 00000000
kernel: nouveau 0000:01:00.0: gr: ILLEGAL_CLASS ch 6 [003f7d0000 java[1817]] subc 0 class 0000 mthd 0114 data 3f800000
kernel: nouveau 0000:01:00.0: gr: ILLEGAL_CLASS ch 6 [003f7d0000 java[1817]] subc 0 class 0000 mthd 0118 data 3f800000
... again all log spammed with these like ILLEGAL_CLASS ...
[ 345.614760] nouveau 0000:01:00.0: gr: ILLEGAL_CLASS ch 6 [003f7d0000 java[1817]] subc 0 class 0186 mthd 1b08 data 000014b6
[ 345.614773] nouveau 0000:01:00.0: gr: ILLEGAL_CLASS ch 6 [003f7d0000 java[1817]] subc 0 class 0186 mthd 1b0c data 1000f010
[ 428.465923] nouveau 0000:01:00.0: gr: ILLEGAL_CLASS ch 6 [003f7d0000 java[1817]] subc 0 class 0186 mthd 0100 data 00000000
[ 428.465961] nouveau 0000:01:00.0: gr: ILLEGAL_CLASS ch 6 [003f7d0000 java[1817]] subc 0 class 0186 mthd 1b00 data 00000000
[ 428.465989] nouveau 0000:01:00.0: gr: ILLEGAL_CLASS ch 6 [003f7d0000 java[1817]] subc 0 class 0186 mthd 1b04 data 00238000
[ 428.466010] nouveau 0000:01:00.0: gr: ILLEGAL_CLASS ch 6 [003f7d0000 java[1817]] subc 0 class 0186 mthd 1b08 data 000014b7
[ 428.466037] nouveau 0000:01:00.0: gr: ILLEGAL_CLASS ch 6 [003f7d0000 java[1817]] subc 0 class 0186 mthd 1b0c data 1000f010
[ 6833.360096] nouveau 0000:01:00.0: fifo: SCHED_ERROR 0a [CTXSW_TIMEOUT]
[ 6833.360109] nouveau 0000:01:00.0: fifo: gr engine fault on channel 6, recovering...
[ 7195.188708] nouveau 0000:01:00.0: gnome-shell[1177]: failed to idle channel 5 [gnome-shell[1177]]
[ 7210.189442] nouveau 0000:01:00.0: gnome-shell[1177]: failed to idle channel 5 [gnome-shell[1177]]
[ 7210.189607] nouveau 0000:01:00.0: fifo: read fault at 0000013000 engine 07 [PBDMA0] client 07 [HOST_CPU] reason 02 [PTE] on channel 5 [003f8aa000 gnome-shell[1177]]
[ 7210.189772] nouveau 0000:01:00.0: fifo: fifo engine fault on channel 5, recovering...
[ 7225.206843] nouveau 0000:01:00.0: java[2577]: failed to idle channel 6 [java[2577]]
[ 7240.207579] nouveau 0000:01:00.0: java[2577]: failed to idle channel 6 [java[2577]]
and here GPU lockup, but kernel itself didn't hung and works fine,
using SSH I was able to kill Xorg, unbind VT and unload nouveau and then reload it again
echo 0 > /sys/class/vtconsole/vtcon1/bind
rmmod nouveau
modprobe nouveau
echo 1 > /sys/class/vtconsole/vtcon1/bind
then I started Xorg and it works, and in kernel log can see
[ 7354.650288] Console: switching to colour dummy device 80x25
[ 7398.714966] [TTM] Finalizing pool allocator
[ 7398.714972] [TTM] Finalizing DMA pool allocator
[ 7398.715081] [TTM] Zone kernel: Used memory at exit: 0 kiB
[ 7398.715084] [TTM] Zone dma32: Used memory at exit: 0 kiB
[ 7398.715664] [drm] Module unloaded
[ 7449.947953] MXM: GUID detected in BIOS
[ 7449.948021] nouveau 0000:01:00.0: NVIDIA GK106 (0e6060a1)
[ 7450.009294] nouveau 0000:01:00.0: bios: version 80.06.21.00.37
[ 7450.010087] nouveau 0000:01:00.0: fb: 1024 MiB GDDR5
[ 7450.064908] [TTM] Zone kernel: Available graphics memory: 12204692 kiB
[ 7450.064912] [TTM] Zone dma32: Available graphics memory: 2097152 kiB
[ 7450.064914] [TTM] Initializing pool allocator
[ 7450.064921] [TTM] Initializing DMA pool allocator
[ 7450.064939] nouveau 0000:01:00.0: DRM: VRAM: 1024 MiB
[ 7450.064942] nouveau 0000:01:00.0: DRM: GART: 1048576 MiB
[ 7450.064947] nouveau 0000:01:00.0: DRM: TMDS table version 2.0
[ 7450.064949] nouveau 0000:01:00.0: DRM: DCB version 4.0
[ 7450.064952] nouveau 0000:01:00.0: DRM: DCB outp 00: 01000f02 00020030
[ 7450.064955] nouveau 0000:01:00.0: DRM: DCB outp 01: 02000f00 00020030
[ 7450.064958] nouveau 0000:01:00.0: DRM: DCB outp 02: 08011f82 0f420030
[ 7450.064960] nouveau 0000:01:00.0: DRM: DCB outp 03: 02022f62 0f420010
[ 7450.064963] nouveau 0000:01:00.0: DRM: DCB conn 00: 00001030
[ 7450.064965] nouveau 0000:01:00.0: DRM: DCB conn 01: 00002131
[ 7450.064967] nouveau 0000:01:00.0: DRM: DCB conn 02: 00010263
[ 7450.066173] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[ 7450.066175] [drm] Driver supports precise vblank timestamp query.
[ 7450.122768] nouveau 0000:01:00.0: DRM: MM: using COPY for buffer copies
[ 7450.213563] nouveau 0000:01:00.0: DRM: allocated 2560x1440 fb: 0x60000, bo ffff8804405c4800
[ 7450.213720] fbcon: nouveaufb (fb0) is primary device
[ 7450.569182] Console: switching to colour frame buffer device 128x48
[ 7450.569972] nouveau 0000:01:00.0: fb0: nouveaufb frame buffer device
[ 7450.587996] [drm] Initialized nouveau 1.3.1 20120801 for 0000:01:00.0 on minor 0
[ 7479.483820] Console: switching to colour dummy device 80x25
[ 7483.096411] Console: switching to colour frame buffer device 128x48
but then after some while it hung again
[10425.388235] nouveau 0000:01:00.0: fifo: SCHED_ERROR 0a [CTXSW_TIMEOUT]
[10425.388240] nouveau 0000:01:00.0: fifo: gr engine fault on channel 6, recovering...
[11378.573422] nouveau 0000:01:00.0: java[4377]: failed to idle channel 6 [java[4377]]
[11393.574071] nouveau 0000:01:00.0: java[4377]: failed to idle channel 6 [java[4377]]
Just installed Kernel 4.6 and now i'm getting the following message:
kernel: nouveau 0000:01:00.0: Direct firmware load for nvidia/gk106/fecs_inst.bin failed with error -2
kernel: nouveau 0000:01:00.0: gr: failed to load fecs_inst
Does the directories have changed again?
I also tried the original nouveau firmware with kernel 4.6. I think it's more stable but when i'm watching a video, totem or vlc are at 100% of CPU
Some days this lockup occurs multiple times.
I can log in to the machine via ssh but UI is completely locked.
Have tried to forcefully kill gdm and restart it, and successfully got it running, but not for long.
...
Oct 27 10:56:40 martin-m4800 nm-dispatcher: req:1 'dhcp4-change' [eno1]: new request (1 scripts)
Oct 27 10:56:40 martin-m4800 nm-dispatcher: req:1 'dhcp4-change' [eno1]: start running ordered scripts...
Oct 27 10:59:10 martin-m4800 kernel: [12589.373968] nouveau 0000:01:00.0: fifo: SCHED_ERROR 0a [CTXSW_TIMEOUT]
Oct 27 10:59:10 martin-m4800 kernel: [12589.373974] nouveau 0000:01:00.0: fifo: gr engine fault on channel 9, recovering...
...
Oct 27 15:48:30 martin-m4800 kernel: [17226.068134] nouveau 0000:01:00.0: gr: TRAP ch 8 [007f5d4000 gnome-shell[1802]]
Oct 27 15:48:30 martin-m4800 kernel: [17226.068145] nouveau 0000:01:00.0: gr: GPC0/PROP trap: 00000080 [ZETA_STORAGE_TYPE_MISMATCH] x = 1672, y = 56, format = 0, storage type = fe
Oct 27 15:48:35 martin-m4800 kernel: [17230.454392] nouveau 0000:01:00.0: fifo: SCHED_ERROR 0a [CTXSW_TIMEOUT]
Oct 27 15:48:35 martin-m4800 kernel: [17230.454398] nouveau 0000:01:00.0: fifo: gr engine fault on channel 9, recovering...
...
Oct 27 16:02:48 martin-m4800 wpa_supplicant[1304]: wlp5s0: Reject scan trigger since one is already pending
Oct 27 16:04:48 martin-m4800 chromium.desktop[2104]: couldn't lock 16384 bytes of memory (secret_session): Cannot allocate memory
Oct 27 16:04:50 martin-m4800 minissdpd[896]: 5 new devices added
Oct 27 16:05:09 martin-m4800 kernel: [ 782.928665] nouveau 0000:01:00.0: fifo: SCHED_ERROR 0a [CTXSW_TIMEOUT]
Oct 27 16:05:09 martin-m4800 kernel: [ 782.928671] nouveau 0000:01:00.0: fifo: gr engine fault on channel 9, recovering...
...
kernel.log shows:
Aug 24 11:05:15 dangelovich-ud kernel: [95310.061112] nouveau 0000:01:00.0: fifo: SCHED_ERROR 0a [CTXSW_TIMEOUT]
Aug 24 11:05:15 dangelovich-ud kernel: [95310.061118] nouveau 0000:01:00.0: fifo: gr engine fault on channel 10, recovering...
Then the machine hangs (frozen, but can still move the cursor) and I have to power off. It's pretty regular at this point, so if more info is required let me know what and how to get it.
I ran into a similar bug with a recent Debian install. It occurs only when I plugged in a monitor into the DVI port. It works fine with the HDMI port as I'm using it to type. But it freezes with the following error.
I'm installing the proprietary drivers so but I wanted to provide this confirmation/documentation in case it is helpful to anyone trying to solve this. In an ideal world I would use the free software drivers as well but there is evidently some issue with the software that is causing it to crash.
1 23:07:09 dragonpunk /usr/lib/gdm3/gdm-x-session[787]: (II) systemd-logind: got pause for 226:0
Sep 1 23:07:09 dragonpunk /usr/lib/gdm3/gdm-x-session[787]: (II) systemd-logind: got pause for 13:67
Sep 1 23:07:09 dragonpunk /usr/lib/gdm3/gdm-x-session[787]: (II) systemd-logind: got pause for 13:66
Sep 1 23:07:09 dragonpunk /usr/lib/gdm3/gdm-x-session[787]: (II) systemd-logind: got pause for 13:64
Sep 1 23:07:09 dragonpunk /usr/lib/gdm3/gdm-x-session[787]: (II) systemd-logind: got pause for 13:65
Sep 1 23:07:09 dragonpunk /usr/lib/gdm3/gdm-x-session[787]: (II) systemd-logind: got pause for 13:68
Sep 1 23:07:09 dragonpunk kernel: [74586.320157] nouveau 0000:01:00.0: fifo: write fault at 000031d000 engine 05 [BAR3] client 08 [BAR_WRITE] reason 02 [PAGE_NOT_PRESENT] on channel -1 [003ff35000 unknown]
Sep 1 23:07:09 dragonpunk kernel: [74586.320169] nouveau 0000:01:00.0: fifo: INTR 08000000
Sep 1 23:07:09 dragonpunk kernel: [74586.825813] nouveau 0000:01:00.0: gr: TRAP ch 2 [003fd30000 systemd-logind[448]]
Sep 1 23:07:09 dragonpunk kernel: [74586.825826] nouveau 0000:01:00.0: gr: GPC0/TPC0/TEX: 80000041
Sep 1 23:07:09 dragonpunk kernel: [74586.825834] nouveau 0000:01:00.0: gr: GPC0/TPC1/TEX: 80000041
Sep 1 23:07:09 dragonpunk kernel: [74586.825846] nouveau 0000:01:00.0: gr: GPC1/TPC0/TEX: 80000041
Sep 1 23:07:09 dragonpunk kernel: [74586.825854] nouveau 0000:01:00.0: gr: GPC1/TPC2/TEX: 80000041
Sep 1 23:07:09 dragonpunk kernel: [74586.825866] nouveau 0000:01:00.0: fifo: read fault at 0002f61000 engine 00 [PGRAPH] client 01 [GPC1/TEX] reason 02 [PAGE_NOT_PRESENT] on channel 2 [003fd30000 systemd-logind[448]]
Sep 1 23:07:09 dragonpunk kernel: [74586.825868] nouveau 0000:01:00.0: fifo: gr engine fault on channel 2, recovering...
Sep 1 23:07:09 dragonpunk kernel: [74586.881742] nouveau 0000:01:00.0: fifo: write fault at 0000278000 engine 00 [PGRAPH] client 0f [GPC0/PROP] reason 02 [PAGE_NOT_PRESENT] on channel 3 [003fc83000 Xwayland[602]]
Sep 1 23:07:09 dragonpunk kernel: [74586.881746] nouveau 0000:01:00.0: fifo: gr engine fault on channel 3, recovering...
Sep 1 23:07:48 dragonpunk gnome-shell[566]: Failed to apply DRM plane transform 0: Invalid argument
Sep 1 23:08:06 dragonpunk gnome-shell[566]: Failed to apply DRM plane transform 0: Invalid argument
Sep 1 23:08:06 dragonpunk gnome-shell[566]: Failed to apply DRM plane transform 0: Invalid argument
Sep 1 23:08:19 dragonpunk gnome-shell[566]: Failed to apply DRM plane transform 0: Invalid argument
I'm on Fedora 27, and I get the freeze with the following messages in dmesg:
[ 0.000000] Linux version 4.13.12-300.fc27.x86_64 (mockbuild@bkernel01.phx2.fedoraproject.org) (gcc version 7.
2.1 20170915 (Red Hat 7.2.1-2) (GCC)) #1 SMP Wed Nov 8 16:38:01 UTC 2017
[ 3571.488551] nouveau 0000:01:00.0: gr: TRAP ch 14 [003f543000 Xwayland[3020]]
[ 3571.488560] nouveau 0000:01:00.0: gr: GPC0/TPC0/TEX: 80000049
[ 3571.488564] nouveau 0000:01:00.0: gr: GPC0/TPC1/TEX: 80000049
[ 3571.488568] nouveau 0000:01:00.0: gr: GPC0/TPC2/TEX: 80000049
[ 3571.488572] nouveau 0000:01:00.0: gr: GPC0/TPC3/TEX: 80000049
[ 3571.488581] nouveau 0000:01:00.0: fifo: read fault at 0004bbc000 engine 00 [PGRAPH] client 0a [GPC0/] reason 02 [PAGE_NOT_PRESENT] on channel 14 [003f543000 Xwayland[3020]]
[ 3571.488583] nouveau 0000:01:00.0: fifo: gr engine fault on channel 14, recovering...
[ 3571.488779] nouveau 0000:01:00.0: Xwayland[3020]: channel 14 killed!
Nov 22 00:51:36 kernel: nouveau 0000:01:00.0: gr: TRAP ch 5 [003fb72000 X[12943]]
Nov 22 00:51:36 kernel: nouveau 0000:01:00.0: gr: GPC0/PROP trap: 00000400 [RT_LINEAR_MISMATCH] x = 2048, y = 1024, format = 18, storage type = 0
Nov 22 00:51:36 kernel: nouveau 0000:01:00.0: fifo: write fault at 0009612000 engine 00 [PGRAPH] client 0f [GPC0/PROP] reason 02 [PAGE_NOT_PRESENT] on chann
el 5 [003fb72000 X[12943]]
Nov 22 00:51:36 kernel: nouveau 0000:01:00.0: fifo: gr engine fault on channel 5, recovering...
Nov 22 00:51:36 kernel: nouveau 0000:01:00.0: X[12943]: channel 5 killed!
Nov 22 00:52:49 kernel: nouveau 0000:01:00.0: X[12943]: failed to idle channel 10 [X[12943]]
Nov 22 00:53:04 kernel: nouveau 0000:01:00.0: X[12943]: failed to idle channel 10 [X[12943]]
Nov 22 00:53:04 kernel: nouveau 0000:01:00.0: fifo: read fault at 0000013000 engine 07 [PFIFO] client 07 [BAR_READ] reason 02 [PAGE_NOT_PRESENT] on channel
10 [003f8ae000 X[12943]]
Nov 22 00:53:04 kernel: nouveau 0000:01:00.0: fifo: fifo engine fault on channel 10, recovering...
Nov 22 00:53:04 kernel: nouveau 0000:01:00.0: X[12943]: channel 10 killed!
Nov 22 00:53:19 kernel: nouveau 0000:01:00.0: X[12943]: failed to idle channel 13 [X[12943]]
Nov 22 00:53:34 kernel: nouveau 0000:01:00.0: X[12943]: failed to idle channel 13 [X[12943]]
Nov 22 00:53:49 kernel: nouveau 0000:01:00.0: X[12943]: failed to idle channel 9 [X[12943]]
Nov 22 00:54:04 kernel: nouveau 0000:01:00.0: X[12943]: failed to idle channel 9 [X[12943]]
Nov 22 00:54:19 kernel: nouveau 0000:01:00.0: X[12943]: failed to idle channel 6 [X[12943]]
Nov 22 00:54:34 kernel: nouveau 0000:01:00.0: X[12943]: failed to idle channel 6 [X[12943]]
Nov 22 00:54:49 kernel: nouveau 0000:01:00.0: X[12943]: failed to idle channel 12 [X[12943]]
Nov 22 00:55:04 kernel: nouveau 0000:01:00.0: X[12943]: failed to idle channel 12 [X[12943]]
Nov 22 00:55:19 kernel: nouveau 0000:01:00.0: X[12943]: failed to idle channel 7 [X[12943]]
Nov 22 00:55:34 kernel: nouveau 0000:01:00.0: X[12943]: failed to idle channel 7 [X[12943]]
Nov 22 00:55:49 kernel: nouveau 0000:01:00.0: X[12943]: failed to idle channel 4 [X[12943]]
Nov 22 00:56:04 kernel: nouveau 0000:01:00.0: X[12943]: failed to idle channel 4 [X[12943]]
Nov 22 00:56:19 kernel: nouveau 0000:01:00.0: X[12943]: failed to idle channel 3 [X[12943]]
Nov 22 00:56:34 kernel: nouveau 0000:01:00.0: X[12943]: failed to idle channel 3 [X[12943]]
Nov 22 00:56:49 kernel: nouveau 0000:01:00.0: X[12943]: failed to idle channel 2 [X[12943]]
Nov 22 00:57:04 kernel: nouveau 0000:01:00.0: X[12943]: failed to idle channel 2 [X[12943]]
Chipset: "NVIDIA NVC1"
NVIDIA Corporation GF108 [GeForce GT 430] (rev a1)
Digital interface is DisplayPort
t-IX and Dustin, you are experiencing a different bug: the current bug report is about a context switching timing out on GK106/GK107 (Kepler architecture), whereas you are using different chipsets (GF106/GF108: Fermi architecture) and the log you provided do not mention the CTXSW_TIMEOUT error. Please open a separate bug report where you also describe what you were doing at the time of the crash/freeze.
Robb, you are most likely hitting a different issue as well, so please open a separate bug report, including the kernel version used and which GPU model you have (you can get it by running lspci -d 10de:).
Jan 24 16:07:35 bach kernel: [28972.823404] nouveau 0000:01:00.0: gr: TRAP ch 8 [007f6be000 chrome[2903]]
Jan 24 16:07:35 bach kernel: [28972.823428] nouveau 0000:01:00.0: gr: GPC0/TPC0/MP trap: global 00000004 [MULTIPLE_WARP_ERRORS] warp 3f000d [OOR_REG]
Jan 24 16:07:35 bach kernel: [28972.823437] nouveau 0000:01:00.0: gr: GPC0/TPC1/MP trap: global 00000004 [MULTIPLE_WARP_ERRORS] warp 3f000d [OOR_REG]
Jan 24 16:07:39 bach kernel: [28977.119555] nouveau 0000:01:00.0: fifo: SCHED_ERROR 0a [CTXSW_TIMEOUT]
Jan 24 16:07:39 bach kernel: [28977.119564] nouveau 0000:01:00.0: fifo: gr engine fault on channel 2, recovering...
The easiest way to reproduce is to leave the computer alone for a while and more often than not, the KDE desktop will lock up. But it also locked while I was actively using it about 1/2 hour ago.
I used to be able to ssh in and type "init 4 ; init 5" and successfully restart KDE. Unfortunately, someone made SSH access tighter recently and I can no longer do that ("Connection refused").
I've tested the 'NvGrUseFw=1' kernel cmdline argument (as stated in old comments from 2016 above) but it doesn't help anymore (the 4 requested fw files by nouveau during early boot were served properly and the errors about missing firmware files vanished as well).
The system freezes maybe once per week. Mostly hard crash with two keyboard leds blinking and hard reset after some time. Sometimes the kernel survives and dmesg can be read out. It shows this:
[ 7896.354441] nouveau 0000:01:00.0: fifo: SCHED_ERROR 0a [CTXSW_TIMEOUT]
[ 7896.354452] nouveau 0000:01:00.0: fifo: runlist 0: scheduled for recovery
[ 7896.354458] nouveau 0000:01:00.0: fifo: channel 2: killed
[ 7896.354464] nouveau 0000:01:00.0: fifo: engine 0: scheduled for recovery
[ 7896.354879] nouveau 0000:01:00.0: systemd-logind[473]: channel 2 killed!
If there's something we can do to help debug this let me know.