Due to an influx of spam, we have had to impose restrictions on new accounts. Please see this wiki page for instructions on how to get full permissions. Sorry for the inconvenience.
Admin message
The migration is almost done, at least the rest should happen in the background. There are still a few technical difference between the old cluster and the new ones, and they are summarized in this issue. Please pay attention to the TL:DR at the end of the comment.
I noticed that the "enable_rc6" parameter is gone since kernel 4.16 and found out the reason is that there aren't any bugs related to it anymore.
Unfortunately this is not true. Lenovo's V310 and V510 laptops massively suffer from random freezing and so far the only fully working solution has been to set the mentioned parameter to 0. The "enable_dc=0" parameter alone is not enough as the freezing is back since kernel 4.16.
Is there any way of a passing an equivalent parameter or any way to turn off RC6? If not, on behalf of all owners of laptops mentioned above I kindly beg the developers to consider the non-negligible effect this deprecation will have on us.
using pstore, having EFI or RAM based pstore, hard and soft lockup detection enabled in your kconfig and booting with "nmi_watchdog=panic panic=5" kernel parameters.
I am using intel_idle (checked with 'cat /sys/devices/system/cpu/cpuidle/current_driver
') so I just turned off power saving and limited the C states and will keep you informed on the results.
I'll attach a dmesg output as well and see what I can do about collecting the logs after freezes. Another user in the Lenovo thread mentioned they have already been collecting them via netconsole so I will also ask them to post the logs here if possible.
Unfortunately the other user says there is zero output with netconsole when the freezes occur, but he has added the debug option and will see if something happens.
As for me, I haven't had a freeze so far, but will keep testing since the freezes can happen multiple times a day, but also only once in a few days. I reckon if they don't occur after a week or two it would be confirmation enough that limiting c-states is a workaround.
In the meantime I would like to provide a short summary of what the issue with these laptops has been. I apologize if this is spammy in any way and please ignore if it is, but I realized the thread I linked is too long so maybe this can be helpful.
The freezes that we refer to are random in nature and total in their effect - meaning physical power-off is necessary. They happen both in Windows and Linux. Even though the thread is for the Kaby Lake V510 model, IIRC there have been freezes with the V310 series as well, and the Skylake version was not exempt. The last time I counted, some 30-ish users had reported this issue, but the confirmed count is much higher since some IT personnel reported freezes on their whole batches of acquired laptops. We believe the issue has something to do with Intel power-saving, but it's quite unclear if this is caused by a driver issue or is a result of bad Lenovo BIOS or motherboard. Lenovo has been unresponsive, while their service centers have usually been replacing the motherboards, which is a solution that helped only one user so far. Windows hacks that worked for some (but not all) users: https://forums.lenovo.com/t5/Lenovo-C-E-K-M-N-and-V-Series/V510-15IKB-Laptop-Freeze/m-p/3852313#M24549. And the Linux hack that worked involved disabling DC and RC6. And oh yes, there was also a previous one that involved turning off DRI, but that came with heavy side-effects.
Update: a freeze did unfortunately occur with c-states limited. The other user from the forum also mentioned he tested this before and had the same outcome.
Journalctl doesn't show anything out of the ordinary. i915 was just switching DC states from 00 to 02 and vice versa, the last one it switched to being 00. What may be interesting is that i915 had been quiet for 14 seconds prior to the freeze, while it usually does something every two seconds. This also happened before in the session, however, and with no freeze. Due to some obstacles, I had not gotten to setting up something to obtain a log while the machine is frozen, but I will see what I can do.
The other user's comment on logging, however, is: "You will don't find any logs related that freeze. Even not with kernels netconsole or any debugging parameters. I've spend many time to that issue and find nothing"
Update: a freeze did unfortunately occur with c-states limited. The other
user from the forum also mentioned he tested this before and had the same
outcome.
Ok, thanks for trying.
> Journalctl doesn't show anything out of the ordinary. i915 was just
> switching DC states from 00 to 02 and vice versa, the last one it switched
> to being 00. What may be interesting is that i915 had been quiet for 14
> seconds prior to the freeze, while it usually does something every two
> seconds. This also happened before in the session, however, and with no
> freeze.
Ok, as I understood you already tried booting with i915.enable_dc=0 and that didn't get rid of the problem.
Could you confirm that all display outputs were off when the freeze happened?
Do you see any other pattern in what you do before the freeze?
I'm guessing the DC state toggling is due to GPU activity, probably due to updating the clock in your GUI. Could you try preventing these updates (and any other GPU activity) for instance by switching away to another VT from your GUI and seeing if the freeze still happens? Please also provide a dmesg log booting with drm.debug=0x1f up to the freeze to double-check what causes the DC state toggling.
Could you try if booting with nomodeset the freeze still happens?
> Due to some obstacles, I had not gotten to setting up something to
> obtain a log while the machine is frozen, but I will see what I can do.
>
> The other user's comment on logging, however, is: "You will don't find any
> logs related that freeze. Even not with kernels netconsole or any debugging
> parameters. I've spend many time to that issue and find nothing"
Ok, please still try if the pstore method provides something.
Ok, as I understood you already tried booting with i915.enable_dc=0 and that
didn't get rid of the problem.
Yes, rc6 needs to be turned off as well.
> Could you confirm that all display outputs were off when the freeze happened?
How can I check this?
> Do you see any other pattern in what you do before the freeze?
No, unfortunately that is the thing with these freezes - they are completely random and cannot be straightforwardly reproduced. A stress test e.g. won't help. From everything that has been written on the forum, they do however seem to happen more often when the GPU is doing work.
> Could you try preventing these updates (and any other GPU activity) for
> instance by switching away to another VT from your GUI and seeing if the freeze > still happens?
How do I go about doing this?
> Please also provide a dmesg og booting with drm.debug=0x1f up to the freeze to > double-check what causes the DC state toggling.
> Could you try if booting with nomodeset the freeze still happens?
> Ok, please still try if the pstore method provides something.
These I mostly understand how to do, except the pstore method, but there may be a guide somewhere. Unfortunately I've had to go back to kernel 4.14 and disabling rc6 due to working on essays for uni deadlines so I will try all this as soon as I'm in the clear, but will also ask again that the other Linux users from the forum contribute here if they can.
Update: a freeze did unfortunately occur with c-states limited. The other
user from the forum also mentioned he tested this before and had the same
outcome.
Ok, thanks for trying.
disabling c-states do not help
Ok, as I understood you already tried booting with i915.enable_dc=0 and that
didn't get rid of the problem.
yes, RC6 have to be disabled
> Could you confirm that all display outputs were off when the freeze happened?
the screens are not off but freezed. After a long time, the screens are black if i remember correctly
> I'm guessing the DC state toggling is due to GPU activity, probably due to
> updating the clock in your GUI. Could you try preventing these updates (and
> any other GPU activity) for instance by switching away to another VT from
> your GUI and seeing if the freeze still happens?
switching away to another VT is NOT possible, its the whole PC that freeze!
Even SysRq don't work, keyboard is also dead
> Please also provide a dmesg
> log booting with drm.debug=0x1f up to the freeze to double-check what causes
> the DC state toggling.
i will do that.
> Could you try if booting with nomodeset the freeze still happens?
i will give it a try
> Ok, please still try if the pstore method provides something.
i will give it a try
I'm guessing the DC state toggling is due to GPU activity, probably due to
updating the clock in your GUI. Could you try preventing these updates (and
any other GPU activity) for instance by switching away to another VT from
your GUI and seeing if the freeze still happens?
switching away to another VT is NOT possible, its the whole PC that freeze!
Even SysRq don't work, keyboard is also dead
I meant here to switch to another VT from the GUI before the freeze to avoid any GPU activity (it looks like it is the periodic clock update based on your later logs) and see if the freeze still happens.
Thanks, looks like the only activity preceding the freeze is some periodic GPU command, I suppose to update the clock in GUI, but nothing out of ordinary. You could still check if enabling pstore would provide additional logs after freeze and reboot. For that you'd need to build your kernel with EFI or RAM based PSTORE support (for EFI: CONFIG_PSTORE=y, CONFIG_EFI_VARS_PSTORE=y) and boot with the 'nmi_watchdog=panic panic=5' kernel params. After freeze/rebooting