[KBL] "enable_rc6" parameter deprecation brings back freezing

added Community feature: power/Other platform: KBL priority::high severity::normal + 1 deleted label

Jani Saarinen @jani.saarinen said:

Imre, any comments?

Chris Wilson @ickle said:

It wasn't rc6 you wanted but the side-effect of disabling powersaving.

Imre Deak @ideak said:

Did you try if limiting CPU C states also gets rid of the problem (leaving graphics power saving enabled)? If using intel_idle you can boot with the

intel_idle.max_cstate=1

kernel parameter to do this.

Could you provide a dmesg log booting with drm.debug=0xe?

Did you try to collect logs after the freeze?:

via ssh if it still works
net or serial console if the machine has a serial/ethernet plug (for netconsole https://wiki.archlinux.org/index.php/Netconsole)
using pstore, having EFI or RAM based pstore, hard and soft lockup detection enabled in your kconfig and booting with "nmi_watchdog=panic panic=5" kernel parameters.

Filip said:

I am using intel_idle (checked with 'cat /sys/devices/system/cpu/cpuidle/current_driver
') so I just turned off power saving and limited the C states and will keep you informed on the results.

I'll attach a dmesg output as well and see what I can do about collecting the logs after freezes. Another user in the Lenovo thread mentioned they have already been collecting them via netconsole so I will also ask them to post the logs here if possible.

Filip uploaded an attachment:

Attachment 138729, "Dmesg output with drm.debug=0xe":
dmesg-10042018.txt

Filip commented on attachment 138729:

Note: this is just after boot, not after the freeze event

Filip said:

Unfortunately the other user says there is zero output with netconsole when the freezes occur, but he has added the debug option and will see if something happens.

As for me, I haven't had a freeze so far, but will keep testing since the freezes can happen multiple times a day, but also only once in a few days. I reckon if they don't occur after a week or two it would be confirmation enough that limiting c-states is a workaround.

In the meantime I would like to provide a short summary of what the issue with these laptops has been. I apologize if this is spammy in any way and please ignore if it is, but I realized the thread I linked is too long so maybe this can be helpful.

The freezes that we refer to are random in nature and total in their effect - meaning physical power-off is necessary. They happen both in Windows and Linux. Even though the thread is for the Kaby Lake V510 model, IIRC there have been freezes with the V310 series as well, and the Skylake version was not exempt. The last time I counted, some 30-ish users had reported this issue, but the confirmed count is much higher since some IT personnel reported freezes on their whole batches of acquired laptops. We believe the issue has something to do with Intel power-saving, but it's quite unclear if this is caused by a driver issue or is a result of bad Lenovo BIOS or motherboard. Lenovo has been unresponsive, while their service centers have usually been replacing the motherboards, which is a solution that helped only one user so far. Windows hacks that worked for some (but not all) users: https://forums.lenovo.com/t5/Lenovo-C-E-K-M-N-and-V-Series/V510-15IKB-Laptop-Freeze/m-p/3852313#M24549. And the Linux hack that worked involved disabling DC and RC6. And oh yes, there was also a previous one that involved turning off DRI, but that came with heavy side-effects.

Filip said:

Update: a freeze did unfortunately occur with c-states limited. The other user from the forum also mentioned he tested this before and had the same outcome.

Journalctl doesn't show anything out of the ordinary. i915 was just switching DC states from 00 to 02 and vice versa, the last one it switched to being 00. What may be interesting is that i915 had been quiet for 14 seconds prior to the freeze, while it usually does something every two seconds. This also happened before in the session, however, and with no freeze. Due to some obstacles, I had not gotten to setting up something to obtain a log while the machine is frozen, but I will see what I can do.

The other user's comment on logging, however, is: "You will don't find any logs related that freeze. Even not with kernels netconsole or any debugging parameters. I've spend many time to that issue and find nothing"

Filip uploaded an attachment:

Attachment 138823, "Journalctl 5 minutes before the freeze":
journalctl_5minsbeforefreeze_

Jani Saarinen @jani.saarinen said:

Imre, any advice to proceed here?

Imre Deak @ideak said:

(In reply to Filip from comment 8)

Update: a freeze did unfortunately occur with c-states limited. The other
user from the forum also mentioned he tested this before and had the same
outcome.

Ok, thanks for trying.

> Journalctl doesn't show anything out of the ordinary. i915 was just
> switching DC states from 00 to 02 and vice versa, the last one it switched
> to being 00. What may be interesting is that i915 had been quiet for 14
> seconds prior to the freeze, while it usually does something every two
> seconds. This also happened before in the session, however, and with no
> freeze.

Ok, as I understood you already tried booting with i915.enable_dc=0 and that didn't get rid of the problem.

Could you confirm that all display outputs were off when the freeze happened?

Do you see any other pattern in what you do before the freeze?

I'm guessing the DC state toggling is due to GPU activity, probably due to updating the clock in your GUI. Could you try preventing these updates (and any other GPU activity) for instance by switching away to another VT from your GUI and seeing if the freeze still happens? Please also provide a dmesg log booting with drm.debug=0x1f up to the freeze to double-check what causes the DC state toggling.

Could you try if booting with nomodeset the freeze still happens?

> Due to some obstacles, I had not gotten to setting up something to
> obtain a log while the machine is frozen, but I will see what I can do.
>
> The other user's comment on logging, however, is: "You will don't find any
> logs related that freeze. Even not with kernels netconsole or any debugging
> parameters. I've spend many time to that issue and find nothing"

Ok, please still try if the pstore method provides something.

Thanks.

Filip said:

(In reply to Imre Deak from comment 11)

Ok, as I understood you already tried booting with i915.enable_dc=0 and that
didn't get rid of the problem.

Yes, rc6 needs to be turned off as well.

> Could you confirm that all display outputs were off when the freeze happened?

How can I check this?

> Do you see any other pattern in what you do before the freeze?

No, unfortunately that is the thing with these freezes - they are completely random and cannot be straightforwardly reproduced. A stress test e.g. won't help. From everything that has been written on the forum, they do however seem to happen more often when the GPU is doing work.

> Could you try preventing these updates (and any other GPU activity) for
> instance by switching away to another VT from your GUI and seeing if the freeze > still happens?

How do I go about doing this?

> Please also provide a dmesg og booting with drm.debug=0x1f up to the freeze to > double-check what causes the DC state toggling.

> Could you try if booting with nomodeset the freeze still happens?

> Ok, please still try if the pstore method provides something.

These I mostly understand how to do, except the pstore method, but there may be a guide somewhere. Unfortunately I've had to go back to kernel 4.14 and disabling rc6 due to working on essays for uni deadlines so I will try all this as soon as I'm in the clear, but will also ask again that the other Linux users from the forum contribute here if they can.

paulz said:

(In reply to Imre Deak from comment 11)

(In reply to Filip from comment 8)

Update: a freeze did unfortunately occur with c-states limited. The other
user from the forum also mentioned he tested this before and had the same
outcome.

Ok, thanks for trying.

disabling c-states do not help

Ok, as I understood you already tried booting with i915.enable_dc=0 and that
didn't get rid of the problem.
yes, RC6 have to be disabled

> Could you confirm that all display outputs were off when the freeze happened?
the screens are not off but freezed. After a long time, the screens are black if i remember correctly

> I'm guessing the DC state toggling is due to GPU activity, probably due to
> updating the clock in your GUI. Could you try preventing these updates (and
> any other GPU activity) for instance by switching away to another VT from
> your GUI and seeing if the freeze still happens?
switching away to another VT is NOT possible, its the whole PC that freeze!
Even SysRq don't work, keyboard is also dead

> Please also provide a dmesg
> log booting with drm.debug=0x1f up to the freeze to double-check what causes
> the DC state toggling.
i will do that.

> Could you try if booting with nomodeset the freeze still happens?
i will give it a try

> Ok, please still try if the pstore method provides something.
i will give it a try

paulz said:

(In reply to paulz from comment 13)

Could you try if booting with nomodeset the freeze still happens?
i will give it a try

with nomodeset i can`t login through Gnome Desktop Manager. Other VT works, but i need graphical environment, so i removed that option again, sorry

paulz uploaded an attachment:

Attachment 139135, "Logfile journalctl, freeze without disabling RC6":
without_rc6.log.tar.gz

paulz said:

(In reply to paulz from comment 13)

Please also provide a dmesg
log booting with drm.debug=0x1f up to the freeze to double-check what causes
the DC state toggling.
i will do that.

freeze after less then 30 minutes without disabling RC6.

logfile: https://bugs.freedesktop.org/attachment.cgi?id=139135

Imre Deak @ideak said:

(In reply to paulz from comment 13)

[...]

I'm guessing the DC state toggling is due to GPU activity, probably due to
updating the clock in your GUI. Could you try preventing these updates (and
any other GPU activity) for instance by switching away to another VT from
your GUI and seeing if the freeze still happens?
switching away to another VT is NOT possible, its the whole PC that freeze!
Even SysRq don't work, keyboard is also dead

I meant here to switch to another VT from the GUI before the freeze to avoid any GPU activity (it looks like it is the periodic clock update based on your later logs) and see if the freeze still happens.

Imre Deak @ideak said:

(In reply to paulz from comment 14)

(In reply to paulz from comment 13)

Could you try if booting with nomodeset the freeze still happens?
i will give it a try

with nomodeset i can`t login through Gnome Desktop Manager. Other VT works,
but i need graphical environment, so i removed that option again, sorry

Here again the idea would be to see if without the i915 driver loaded the machine still freezes.

Imre Deak @ideak said:

(In reply to paulz from comment 16)

(In reply to paulz from comment 13)

Please also provide a dmesg
log booting with drm.debug=0x1f up to the freeze to double-check what causes
the DC state toggling.
i will do that.

freeze after less then 30 minutes without disabling RC6.

logfile: https://bugs.freedesktop.org/attachment.cgi?id=139135

Thanks, looks like the only activity preceding the freeze is some periodic GPU command, I suppose to update the clock in GUI, but nothing out of ordinary. You could still check if enabling pstore would provide additional logs after freeze and reboot. For that you'd need to build your kernel with EFI or RAM based PSTORE support (for EFI: CONFIG_PSTORE=y, CONFIG_EFI_VARS_PSTORE=y) and boot with the 'nmi_watchdog=panic panic=5' kernel params. After freeze/rebooting

# mount -t pstore none ``

should put any such logs in ``.

[KBL] "enable_rc6" parameter deprecation brings back freezing

Submitted by Filip

Description

Child items ...

Activity

Admin message

Admin message

[KBL] "enable_rc6" parameter deprecation brings back freezing

Submitted by Filip

Description

Activity