Due to an influx of spam, we have had to impose restrictions on new accounts. Please see this wiki page for instructions on how to get full permissions. Sorry for the inconvenience.
Admin message
The migration is almost done, at least the rest should happen in the background. There are still a few technical difference between the old cluster and the new ones, and they are summarized in this issue. Please pay attention to the TL:DR at the end of the comment.
Allocation failure on swap-in is not uncommon when dealing with 5G+ of swap, the kernel struggles to cope and we make more noise than most. That failure does not look to be the cause of the later hang, though it may indeed be related to memory pressure (although being snb it is llc so less susceptible to most forms of corruption, you can still hypothesize data not making it to/from swap that leads to context corruption). I would say the memory layout of the batch supports the hypothesis that the context has been swapped out and back in. So I am going to err on the side of assuming this is an invalid context image due to swap.
Allocation failure on swap-in is not uncommon when dealing with 5G+ of swap,
the kernel struggles to cope and we make more noise than most.
Interesting. This suggests an incongruence between typical 1:1 RAM swap partition sizes by most distro installers, at least for use cases where there will be heavy pressure on RAM rather than incidental swap usage. In your view, is this a case of, "doctor, it hurts when I do this" and the doctor says, "right, so don't do that" or is there room for improvement?
Note: these examples are unique in that the test system is using swap on ZRAM. So it should be significantly faster than conventional swap on a partition. Also, these examples have /dev/zram0 sized to 1.5X RAM, but it's reproducible at 1:1. In smaller swap cases, I've seen these same call traces far less frequently, and also oom-killer happens more frequently.
> That failure
> does not look to be the cause of the later hang, though it may indeed be
> related to memory pressure (although being snb it is llc so less susceptible
> to most forms of corruption, you can still hypothesize data not making it
> to/from swap that leads to context corruption). I would say the memory
> layout of the batch supports the hypothesis that the context has been
> swapped out and back in. So I am going to err on the side of assuming this
> is an invalid context image due to swap.
The narrow goal of this torture test is to find ways of improving system responsiveness under heavy swap use. And also it acts much like an unprivileged fork bomb that can, somewhat non-deterministically I'm finding, take down the system (totally unresponsive for >30 minutes). And in doing that, I'm stumbling over other issues like this one.
For desktops, it's a problem to not have swap big enough to support hibernation.
Allocation failure on swap-in is not uncommon when dealing with 5G+ of swap,
the kernel struggles to cope and we make more noise than most.
Interesting. This suggests an incongruence between typical 1:1 RAM swap
partition sizes by most distro installers, at least for use cases where
there will be heavy pressure on RAM rather than incidental swap usage. In
your view, is this a case of, "doctor, it hurts when I do this" and the
doctor says, "right, so don't do that" or is there room for improvement?
It's definitely the kernel's problem in mishandling resources, there are plenty still available, we just aren't getting the pages when they are required, as they are required. Aside from that, we are not prioritising interactive workloads very well under these conditions. From our point of view that only increases the mempressure for graphic resources -- work builds up faster than we can process, write amplification from client to display.
> Note: these examples are unique in that the test system is using swap on
> ZRAM. So it should be significantly faster than conventional swap on a
> partition. Also, these examples have /dev/zram0 sized to 1.5X RAM, but it's
> reproducible at 1:1. In smaller swap cases, I've seen these same call traces
> far less frequently, and also oom-killer happens more frequently.
>
> > That failure
> > does not look to be the cause of the later hang, though it may indeed be
> > related to memory pressure (although being snb it is llc so less susceptible
> > to most forms of corruption, you can still hypothesize data not making it
> > to/from swap that leads to context corruption). I would say the memory
> > layout of the batch supports the hypothesis that the context has been
> > swapped out and back in. So I am going to err on the side of assuming this
> > is an invalid context image due to swap.
>
> The narrow goal of this torture test is to find ways of improving system
> responsiveness under heavy swap use. And also it acts much like an
> unprivileged fork bomb that can, somewhat non-deterministically I'm finding,
> take down the system (totally unresponsive for >30 minutes). And in doing
> that, I'm stumbling over other issues like this one.
Yup. Death-by-swap is an old problem (when the oomkiller doesn't kill you, you can die of old age waiting for a response wishing it had). Most of our effort is spent trying to minimise the system-wide impact when running at max memory (when the caches are regularly reaped), handling swap well has been an after thought for a decade.
This is perhaps superfluous Test with a conventional swap on plain partition on SSD, and the same thing happens. We can say it's not caused by swap on ZRAM.
It's definitely the kernel's problem in mishandling resources, there are
plenty still available, we just aren't getting the pages when they are
required, as they are required.
I see this very pronounced in the conventional swap on SSD case above, where top reports ~60% wa, and while free RAM is low, there's still quite a lot of swap left. But not a lot of activity compared to the swap on ZRAM case.
> Yup. Death-by-swap is an old problem (when the oomkiller doesn't kill you,
> you can die of old age waiting for a response wishing it had). Most of our
> effort is spent trying to minimise the system-wide impact when running at
> max memory (when the caches are regularly reaped), handling swap well has
> been an after thought for a decade.
I've tried quite a lot of variations, different sized swaps, swap on ZRAM, and zswap. And mostly it seems like rearranging deck chairs. I'm not getting enough quality data to have any idea which one is even marginally better, there's always some trade off. I guess I should focus instead on ways of containing unprivileged fork bombs - better they get mad in their own box than take down the whole system.
Found this by searching "swap". It's pretty much the single available search result.
I've been getting regular hangs on 5.5-rc series when starting to actively hit swap space. I have 16G RAM and 16G swap configured. During whatever work in Firefox, at one point graphics hang. Latest hang was today 5.5-rc6, at 6 days uptime.
^^^^ First hung task traceback - does this give us anything useful?
I currently rebooted into today's 5.5.0-rc7-gentoo-drm-tip-2020y-01m-24d-12h-10m-34s+, but since this problem has been present throughout 5.5-rc cycle, I wanted to ask for further input regardless since it doesn't seem to be improving for 5.5.
EDIT
cat /sys/class/drm/card0/error said "No error state recorded." It doesn't look like the infamous #673 (closed)
Still, situation was not recoverable, because many GPU-bound processes were in uninterruptible sleep state, and could not be killed with anything. After doing some ssh-based debugging, eventually had to REISUB.