[snb] GPU HANG in gnome-shell (after swap?)

added Community GPU hang platform: SNB priority::medium severity::major + 1 deleted label

Chris Murphy uploaded an attachment:

Attachment 145193, "dmesg":
bug111512-journal.txt

Chris Murphy uploaded an attachment:

Attachment 145194, "drm card0 error":
bug111512-drmcard0error.out

Chris Murphy uploaded an attachment:

Attachment 145195, "lspci -vvnn":
bug111512-lspci.txt

Chris Wilson @ickle said:

Allocation failure on swap-in is not uncommon when dealing with 5G+ of swap, the kernel struggles to cope and we make more noise than most. That failure does not look to be the cause of the later hang, though it may indeed be related to memory pressure (although being snb it is llc so less susceptible to most forms of corruption, you can still hypothesize data not making it to/from swap that leads to context corruption). I would say the memory layout of the batch supports the hypothesis that the context has been swapped out and back in. So I am going to err on the side of assuming this is an invalid context image due to swap.

Chris Murphy uploaded an attachment:

Point of comparison with a different kernel. It looks like the same thing. I guess I just don't see these messages with the non-debug kernels.

Attachment 145211, "dmesg kernel 5.2.11":
bug111512-journal2.txt

Chris Murphy said:

(In reply to Chris Wilson from comment 4)

Allocation failure on swap-in is not uncommon when dealing with 5G+ of swap,
the kernel struggles to cope and we make more noise than most.

Interesting. This suggests an incongruence between typical 1:1 RAM swap partition sizes by most distro installers, at least for use cases where there will be heavy pressure on RAM rather than incidental swap usage. In your view, is this a case of, "doctor, it hurts when I do this" and the doctor says, "right, so don't do that" or is there room for improvement?

Note: these examples are unique in that the test system is using swap on ZRAM. So it should be significantly faster than conventional swap on a partition. Also, these examples have /dev/zram0 sized to 1.5X RAM, but it's reproducible at 1:1. In smaller swap cases, I've seen these same call traces far less frequently, and also oom-killer happens more frequently.

> That failure
> does not look to be the cause of the later hang, though it may indeed be
> related to memory pressure (although being snb it is llc so less susceptible
> to most forms of corruption, you can still hypothesize data not making it
> to/from swap that leads to context corruption). I would say the memory
> layout of the batch supports the hypothesis that the context has been
> swapped out and back in. So I am going to err on the side of assuming this
> is an invalid context image due to swap.

The narrow goal of this torture test is to find ways of improving system responsiveness under heavy swap use. And also it acts much like an unprivileged fork bomb that can, somewhat non-deterministically I'm finding, take down the system (totally unresponsive for >30 minutes). And in doing that, I'm stumbling over other issues like this one.

For desktops, it's a problem to not have swap big enough to support hibernation.

Chris Wilson @ickle said:

(In reply to Chris Murphy from comment 6)

(In reply to Chris Wilson from comment 4)

Allocation failure on swap-in is not uncommon when dealing with 5G+ of swap,
the kernel struggles to cope and we make more noise than most.

Interesting. This suggests an incongruence between typical 1:1 RAM swap
partition sizes by most distro installers, at least for use cases where
there will be heavy pressure on RAM rather than incidental swap usage. In
your view, is this a case of, "doctor, it hurts when I do this" and the
doctor says, "right, so don't do that" or is there room for improvement?

It's definitely the kernel's problem in mishandling resources, there are plenty still available, we just aren't getting the pages when they are required, as they are required. Aside from that, we are not prioritising interactive workloads very well under these conditions. From our point of view that only increases the mempressure for graphic resources -- work builds up faster than we can process, write amplification from client to display.

> Note: these examples are unique in that the test system is using swap on
> ZRAM. So it should be significantly faster than conventional swap on a
> partition. Also, these examples have /dev/zram0 sized to 1.5X RAM, but it's
> reproducible at 1:1. In smaller swap cases, I've seen these same call traces
> far less frequently, and also oom-killer happens more frequently.
>
> > That failure
> > does not look to be the cause of the later hang, though it may indeed be
> > related to memory pressure (although being snb it is llc so less susceptible
> > to most forms of corruption, you can still hypothesize data not making it
> > to/from swap that leads to context corruption). I would say the memory
> > layout of the batch supports the hypothesis that the context has been
> > swapped out and back in. So I am going to err on the side of assuming this
> > is an invalid context image due to swap.
>
> The narrow goal of this torture test is to find ways of improving system
> responsiveness under heavy swap use. And also it acts much like an
> unprivileged fork bomb that can, somewhat non-deterministically I'm finding,
> take down the system (totally unresponsive for >30 minutes). And in doing
> that, I'm stumbling over other issues like this one.

Yup. Death-by-swap is an old problem (when the oomkiller doesn't kill you, you can die of old age waiting for a response wishing it had). Most of our effort is spent trying to minimise the system-wide impact when running at max memory (when the caches are regularly reaped), handling swap well has been an after thought for a decade.

Chris Murphy uploaded an attachment:

This is perhaps superfluous Test with a conventional swap on plain partition on SSD, and the same thing happens. We can say it's not caused by swap on ZRAM.

Attachment 145212, "dmesg conventional swap, 5.3.0rc6":
bug111512-journal3.txt

Chris Murphy said:

(In reply to Chris Wilson from comment 7)

It's definitely the kernel's problem in mishandling resources, there are
plenty still available, we just aren't getting the pages when they are
required, as they are required.

I see this very pronounced in the conventional swap on SSD case above, where top reports ~60% wa, and while free RAM is low, there's still quite a lot of swap left. But not a lot of activity compared to the swap on ZRAM case.

Active(file): 94364 kB
$ cat /proc/meminfo
MemTotal: 8025296 kB
MemFree: 120132 kB
MemAvailable: 119600 kB
Buffers: 84 kB
Cached: 232996 kB
SwapCached: 601992 kB
Active: 6403420 kB
Inactive: 980736 kB
Active(anon): 6309056 kB
Inactive(anon): 914428 kB
Active(file): 94364 kB
Inactive(file): 66308 kB
Unevictable: 23220 kB
Mlocked: 0 kB
SwapTotal: 8214524 kB
SwapFree: 3756296 kB
Dirty: 840 kB
Writeback: 0 kB
AnonPages: 6899812 kB
Mapped: 128652 kB
Shmem: 72784 kB
KReclaimable: 116684 kB
Slab: 324752 kB
SReclaimable: 116684 kB
SUnreclaim: 208068 kB
KernelStack: 15296 kB
PageTables: 44364 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 12227172 kB
Committed_AS: 15204776 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 40452 kB
VmallocChunk: 0 kB
Percpu: 20864 kB
HardwareCorrupted: 0 kB
AnonHugePages: 0 kB
ShmemHugePages: 0 kB
ShmemPmdMapped: 0 kB
CmaTotal: 0 kB
CmaFree: 0 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
Hugetlb: 0 kB
DirectMap4k: 381944 kB
DirectMap2M: 7917568 kB
[chris@fmac ~]$

> Yup. Death-by-swap is an old problem (when the oomkiller doesn't kill you,
> you can die of old age waiting for a response wishing it had). Most of our
> effort is spent trying to minimise the system-wide impact when running at
> max memory (when the caches are regularly reaped), handling swap well has
> been an after thought for a decade.

I've tried quite a lot of variations, different sized swaps, swap on ZRAM, and zswap. And mostly it seems like rearranging deck chairs. I'm not getting enough quality data to have any idea which one is even marginally better, there's always some trade off. I guess I should focus instead on ways of containing unprivileged fork bombs - better they get mad in their own box than take down the whole system.

Chris Wilson @ickle closed a related bug:

*** Bug 111930 has been marked as a duplicate of this bug. ***

Found this by searching "swap". It's pretty much the single available search result.

I've been getting regular hangs on 5.5-rc series when starting to actively hit swap space. I have 16G RAM and 16G swap configured. During whatever work in Firefox, at one point graphics hang. Latest hang was today 5.5-rc6, at 6 days uptime.

jaan  24 18:03:27 papaya org.gnome.Shell.desktop[1541]: Fontconfig error: Cannot load default config file
jaan  24 18:06:11 papaya /usr/libexec/gdm-x-session[1461]: (EE) client bug: timer event28 debounce: offset negative (-24ms)
jaan  24 18:06:11 papaya /usr/libexec/gdm-x-session[1461]: (EE) client bug: timer event28 debounce short: offset negative (-11ms)

^^^^ some earlier timestamps, nothing has gone wrong yet, but can see swapping causes delays in processing.

jaan  24 18:08:28 papaya org.gnome.Shell.desktop[1541]: Fontconfig error: Cannot load default config file
jaan  24 18:08:46 papaya /usr/libexec/gdm-x-session[1461]: (EE) client bug: timer event28 debounce: offset negative (-0ms)
jaan  24 18:11:47 papaya firefox-bin.desktop[1541]: [GFX1-]: Killing GPU process due to IPC reply timeout

^^^^ Boom, and now we're hung.

jaan  24 18:11:47 papaya firefox-bin.desktop[1541]: [GFX1-]: Receive IPC close with reason=AbnormalShutdown
...
jaan  24 18:11:48 papaya firefox-bin.desktop[1541]: [GFX1-]: Receive IPC close with reason=AbnormalShutdown
jaan  24 18:11:58 papaya firefox-bin.desktop[1541]: [GFX1-]: Killing GPU process due to IPC reply timeout
jaan  24 18:11:58 papaya firefox-bin.desktop[1541]: [GFX1-]: Failed to create an OMT compositor.
jaan  24 18:11:58 papaya firefox-bin.desktop[1541]: [GFX1-]: Failed to create remote compositor
...
jaan  24 18:11:58 papaya kernel: INFO: task chrome:409983 blocked for more than 61 seconds.
jaan  24 18:11:58 papaya kernel:       Tainted: G           O      5.5.0-rc6-gentoo+ #8
jaan  24 18:11:58 papaya kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
jaan  24 18:11:58 papaya kernel: chrome          D    0 409983 409980 0x00004004
jaan  24 18:11:58 papaya kernel: Call Trace:
jaan  24 18:11:58 papaya kernel:  ? __schedule+0x1ec/0x590
jaan  24 18:11:58 papaya kernel:  schedule+0x5a/0xc0
jaan  24 18:11:58 papaya kernel:  schedule_preempt_disabled+0xc/0x20
jaan  24 18:11:58 papaya kernel:  __mutex_lock.isra.0+0x28d/0x500
jaan  24 18:11:58 papaya kernel:  ? __i915_sw_fence_complete+0x105/0x190 [i915]
jaan  24 18:11:58 papaya kernel:  i915_vma_pin+0x351/0x6d0 [i915]
jaan  24 18:11:58 papaya kernel:  eb_lookup_vmas+0x1e0/0xad0 [i915]
jaan  24 18:11:58 papaya kernel:  i915_gem_do_execbuffer+0x63f/0x1820 [i915]
jaan  24 18:11:58 papaya kernel:  ? xas_store+0x56/0x640
jaan  24 18:11:58 papaya kernel:  ? __alloc_pages_nodemask+0x270/0x2e0
jaan  24 18:11:58 papaya kernel:  ? kmalloc_order+0x54/0x60
jaan  24 18:11:58 papaya kernel:  i915_gem_execbuffer2_ioctl+0x1c5/0x390 [i915]
jaan  24 18:11:58 papaya kernel:  ? i915_gem_execbuffer_ioctl+0x2c0/0x2c0 [i915]
jaan  24 18:11:58 papaya kernel:  drm_ioctl_kernel+0xa5/0xf0 [drm]
jaan  24 18:11:58 papaya kernel:  drm_ioctl+0x1f1/0x390 [drm]
jaan  24 18:11:58 papaya kernel:  ? i915_gem_execbuffer_ioctl+0x2c0/0x2c0 [i915]
jaan  24 18:11:58 papaya kernel:  ? __handle_mm_fault+0x5fc/0x1220
jaan  24 18:11:58 papaya kernel:  ? remove_vma+0x3d/0x50
jaan  24 18:11:58 papaya kernel:  do_vfs_ioctl+0x451/0x6d0
jaan  24 18:11:58 papaya kernel:  ksys_ioctl+0x35/0x70
jaan  24 18:11:58 papaya kernel:  __x64_sys_ioctl+0x11/0x20
jaan  24 18:11:58 papaya kernel:  do_syscall_64+0x43/0x110
jaan  24 18:11:58 papaya kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9
jaan  24 18:11:58 papaya kernel: RIP: 0033:0x7f85ab8d2057
jaan  24 18:11:58 papaya kernel: Code: Bad RIP value.
jaan  24 18:11:58 papaya kernel: RSP: 002b:00007ffeaad91b18 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
jaan  24 18:11:58 papaya kernel: RAX: ffffffffffffffda RBX: 00000000ffffffff RCX: 00007f85ab8d2057
jaan  24 18:11:58 papaya kernel: RDX: 00007ffeaad91b60 RSI: 0000000040406469 RDI: 000000000000000d
jaan  24 18:11:58 papaya kernel: RBP: 00007ffeaad91b60 R08: 00002b4088561100 R09: 0000000000000000
jaan  24 18:11:58 papaya kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 0000000040406469
jaan  24 18:11:58 papaya kernel: R13: 000000000000000d R14: 00007f85a55df3a8 R15: 0000000000000000

^^^^ First hung task traceback - does this give us anything useful?

I currently rebooted into today's 5.5.0-rc7-gentoo-drm-tip-2020y-01m-24d-12h-10m-34s+, but since this problem has been present throughout 5.5-rc cycle, I wanted to ask for further input regardless since it doesn't seem to be improving for 5.5.

EDIT

cat /sys/class/drm/card0/error said "No error state recorded." It doesn't look like the infamous #673 (closed)

Still, situation was not recoverable, because many GPU-bound processes were in uninterruptible sleep state, and could not be killed with anything. After doing some ssh-based debugging, eventually had to REISUB.

Intel® UHD Graphics (Whiskey Lake 3x8 GT2)

2020-01-24-swap-pressure-hang-dmesg.txt

mentioned in issue #673 (closed)

Hi Reporter, please confirm if this issue is still seen on latest kernel also.

If yes, provide the steps to reproduce it for further debugging.

Since there is no activity for 10 months on this bug. Closing #385 (closed). Please verify the issue on latest drm and reopen the bug if reproducing.

closed

[snb] GPU HANG in gnome-shell (after swap?)

Submitted by Chris Murphy

Description

Child items ...

Activity

Admin message

Admin message

[snb] GPU HANG in gnome-shell (after swap?)

Submitted by Chris Murphy

Description

Activity