nouveau GPU locks up under memory pressure
@hramrach
Submitted by Michal Suchánek Assigned to Nouveau Project
Description
When there is memory pressure GPU tends to hang.
This is probably related to system memory pressure (not vram) although I have no idea about vram utilisation.
Usually crash happens when I start an application that uses the GPU and the system starts to swap and/or OOM killer kills something and/or applications crash due to bad handling of OOM condition.
01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GeForce GT 620 [10de:0f01] (rev a1) (prog-if 00 [VGA controller])
Subsystem: ASUSTeK Computer Inc. Device [1043:83ff]
Flags: bus master, fast devsel, latency 0, IRQ 52
Memory at fc000000 (32-bit, non-prefetchable) [size=16M]
Memory at f0000000 (64-bit, prefetchable) [size=128M]
Memory at f8000000 (64-bit, prefetchable) [size=32M]
I/O ports at dc80 [size=128]
Expansion ROM at fde00000 [disabled] [size=512K]
Capabilities: [60] Power Management version 3
Capabilities: [68] MSI: Enable+ Count=1/1 Maskable- 64bit+
Capabilities: [78] Express Endpoint, MSI 00
Capabilities: [b4] Vendor Specific Information: Len=14 >
Capabilities: [100] Virtual Channel
Capabilities: [128] Power Budgeting >
Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
Kernel driver in use: nouveau
Linux 3.15-trunk-amd64 #1 SMP Debian 3.15.5-1~exp1 (2014-07-10) x86_64 GNU/Linux
ii libgl1-mesa-dri:am 10.2.3-1 amd64
[ 2574.171692] nouveau E[ PFIFO][0000:01:00.0] read fault at 0x0000011000 [INVALID_STORAGE_TYPE] from PFIFO/PFIFO on channel 0x007edbc000 [unknown]
[ 2664.669780] nouveau E[ DRM] GPU lockup - switching to software fbcon
[ 2679.688012] nouveau E[Xorg[1971]] failed to idle channel 0xcccc0001 [Xorg[1971]]
[ 151.697805] nouveau E[ PFIFO][0000:01:00.0] read fault at 0x0000011000 [INVALID_STORAGE_TYPE] from PFIFO/PFIFO on channel 0x007ed88000 [unknown]
[ 168.639601] nouveau E[ DRM] GPU lockup - switching to software fbcon
[ 183.760010] nouveau E[Xorg[2027]] failed to idle channel 0xcccc0001 [Xorg[2027]]
[ 134.917421] nouveau E[ PFIFO][0000:01:00.0] read fault at 0x0000011000 [INVALID_STORAGE_TYPE] from PFIFO/PFIFO on channel 0x007ed88000 [unknown]
[ 165.296145] nouveau E[ DRM] GPU lockup - switching to software fbcon
[ 7.563122] nouveau [ DEVICE][0000:01:00.0] BOOT0 : 0x0c1080a1
[ 7.569331] nouveau [ DEVICE][0000:01:00.0] Chipset: GF108 (NVC1)
[ 7.575693] nouveau [ DEVICE][0000:01:00.0] Family : NVC0
[ 7.586561] usbcore: registered new interface driver snd-usb-audio
[ 7.644889] nouveau [ VBIOS][0000:01:00.0] checking PRAMIN for image...
[ 7.790243] nouveau [ VBIOS][0000:01:00.0] ... appears to be valid
[ 7.790245] nouveau [ VBIOS][0000:01:00.0] using image from PRAMIN
[ 7.790338] nouveau [ VBIOS][0000:01:00.0] BIT signature found
[ 7.790340] nouveau [ VBIOS][0000:01:00.0] version 70.08.ae.00.02
[ 7.790366] Bluetooth: HCI socket layer initialized
[ 7.790367] Bluetooth: L2CAP socket layer initialized
[ 7.790376] Bluetooth: SCO socket layer initialized
[ 7.797368] nouveau 0000:01:00.0: irq 52 for MSI/MSI-X
[ 7.797377] nouveau [ PMC][0000:01:00.0] MSI interrupts enabled
[ 7.797415] nouveau W[ PFB][0000:01:00.0][0x00000000][ffff88022bbb7800] reclocking of this ram type unsupported
[ 7.797416] nouveau [ PFB][0000:01:00.0] RAM type: DDR3
[ 7.797417] nouveau [ PFB][0000:01:00.0] RAM size: 2048 MiB
[ 7.797418] nouveau [ PFB][0000:01:00.0] ZCOMP: 0 tags
[ 7.801509] nouveau [ VOLT][0000:01:00.0] GPU voltage: 900000uv
[ 9.300033] nouveau [ PTHERM][0000:01:00.0] FAN control: none / external
[ 9.306998] nouveau [ PTHERM][0000:01:00.0] fan management: automatic
[ 9.313701] nouveau [ PTHERM][0000:01:00.0] internal sensor: yes
[ 9.320011] nouveau [ CLK][0000:01:00.0] 03: core 50 MHz memory 324 MHz
[ 9.320113] EXT4-fs (sdd1): mounting ext3 file system using the ext4 subsystem
[ 9.334538] nouveau [ CLK][0000:01:00.0] 07: core 405 MHz memory 324 MHz
[ 9.341863] nouveau [ CLK][0000:01:00.0] 0f: core 700 MHz memory 700 MHz
[ 9.349339] nouveau [ CLK][0000:01:00.0] --: core 405 MHz memory 324 MHz
[ 9.359052] [TTM] Zone kernel: Available graphics memory: 4032366 kiB
[ 9.365668] [TTM] Zone dma32: Available graphics memory: 2097152 kiB
[ 9.372279] [TTM] Initializing pool allocator
[ 9.376741] [TTM] Initializing DMA pool allocator
[ 9.381547] nouveau [ DRM] VRAM: 2048 MiB
[ 9.386087] nouveau [ DRM] GART: 1048576 MiB
[ 9.390892] nouveau [ DRM] TMDS table version 2.0
[ 9.396126] nouveau [ DRM] DCB version 4.0
[ 9.400746] nouveau [ DRM] DCB outp 00: 01000302 00020030
[ 9.406675] nouveau [ DRM] DCB outp 01: 02000300 00000000
[ 9.412608] nouveau [ DRM] DCB outp 02: 08011392 00020020
[ 9.418523] nouveau [ DRM] DCB outp 03: 04022310 00000000
[ 9.421527] EXT4-fs (sdd1): mounted filesystem with ordered data mode. Opts: errors=remount-ro
[ 9.433144] nouveau [ DRM] DCB conn 00: 00001030
[ 9.439774] nouveau [ DRM] DCB conn 01: 00002161
[ 9.446363] nouveau [ DRM] DCB conn 02: 00000200
[ 9.452457] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[ 9.459164] [drm] Driver supports precise vblank timestamp query.
[ 9.470482] nouveau [ DRM] MM: using COPY0 for buffer copies
[ 9.500028] usb 4-2: new full-speed USB device number 4 using uhci_hcd
[ 9.584981] nouveau [ DRM] allocated 1600x1600 fb: 0x60000, bo ffff88022e07c800
[ 9.592947] fbcon: nouveaufb (fb0) is primary device
[ 9.642993] EXT4-fs (dm-4): mounted filesystem with ordered data mode. Opts: (null)
[ 9.684077] Console: switching to colour frame buffer device 150x75
[ 9.705241] nouveau 0000:01:00.0: fb0: nouveaufb frame buffer device
[ 9.705246] nouveau 0000:01:00.0: registered panic notifier
[ 9.705257] [drm] Initialized nouveau 1.1.1 20120801 for 0000:01:00.0 on minor 0
[177491.295050] nouveau E[Wakfu[2020]] fail ttm_validate
[177491.300109] nouveau E[Wakfu[2020]] validate gart_list
[177491.305449] nouveau E[Wakfu[2020]] validate: -12
[177717.658727] usb 8-4: USB disconnect, device number 14
[177803.434648] nouveau E[ PFIFO][0000:01:00.0] write fault at 0x0000218000 [PAGE_NOT_PRESENT] from PGRAPH/DISPATCH on channel 0x007f89c000 [Wakfu[2020]]
[177803.438624] nouveau E[ PFIFO][0000:01:00.0] PGRAPH engine fault on channel 5, recovering...
[177983.108013] nouveau E[Xorg[1899]] failed to idle channel 0xcccc0000 [Xorg[1899]]
[177998.112017] nouveau E[Xorg[1899]] failed to idle channel 0xcccc0000 [Xorg[1899]]
[177998.119751] nouveau E[ PFIFO][0000:01:00.0] read fault at 0x000001b000 [PAGE_NOT_PRESENT] from PFIFO/BAR_READ on channel 0x007fb5a000 [unknown]
[178002.857403] nouveau E[ DRM] GPU lockup - switching to software fbcon
[178015.816010] nouveau E[Wakfu[2016]] failed to idle channel 0xcccc0000 [Wakfu[2016]]
an easy way to trigger the issue is to
download the above game client,
unpack,
run the launcher script (wakfu/wakfu),
wait for updates to finish,
and press the PLAY button repeatedly until memory runs out.
The client takes about 1.2GB