Regression from "ACPI: OSI: Remove Linux-Dell-Video _OSI string"
Hi,
after updating to from a 6.0 kernel to a 6.1 kernel I would experience hard lockups after a few tens of minutes of uptime. On a few occasions I happened to be on the console and each time saw the same three line error messages. One such instance:
[ 58.729863] ACPI Error: Aborting method \_SB.PCI0.PGON due to previous error (AE_AML_LOOP_TIMEOUT) (20220331/psparse-529)
[ 58.729904] ACPI Error: Aborting method \_SB.PCI0.PEG0.PG00._ON due to previous error (AE_AML_LOOP_TIMEOUT) (20220331/psparse-529)
[ 60.083261] vfio-pci 0000:01:00.0 Unable to change power state from D3cold to D0, device inaccessible
I ran git bisect to identify the commit:
(git)-[v6.1-rc1~206^2~4^5~3|bisect] % git bisect bad
24867516f06dabedef3be7eea0ef0846b91538bc is the first bad commit
commit 24867516f06dabedef3be7eea0ef0846b91538bc
Author: Mario Limonciello <mario.limonciello@amd.com>
Date: Tue Aug 23 13:51:31 2022 -0500
ACPI: OSI: Remove Linux-Dell-Video _OSI string
This string was introduced because drivers for NVIDIA hardware
had bugs supporting RTD3 in the past.
Before proprietary NVIDIA driver started to support RTD3, Ubuntu had
had a mechanism for switching PRIME on and off, though it had required
to logout/login to make the library switch happen.
When the PRIME had been off, the mechanism had unloaded the NVIDIA
driver and put the device into D3cold, but the GPU had never come back
to D0 again which is why ODMs used the _OSI to expose an old _DSM
method to switch the power on/off.
That has been fixed by commit 5775b843a619 ("PCI: Restore config space
on runtime resume despite being unbound"). so vendors shouldn't be
using this string to modify ASL any more.
Reviewed-by: Lyude Paul <lyude@redhat.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
drivers/acpi/osi.c | 9 ---------
1 file changed, 9 deletions(-)
Info about the hardware: Dell XPS 15 7590
% lspci -tvnn
-[0000:00]-+-00.0 Intel Corporation Device [8086:3e20]
+-01.0-[01]----00.0 NVIDIA Corporation TU117M [GeForce GTX 1650 Mobile / Max-Q] [10de:1f91]
+-02.0 Intel Corporation CoffeeLake-H GT2 [UHD Graphics 630] [8086:3e9b]
+-04.0 Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor Thermal Subsystem [8086:1903]
+-08.0 Intel Corporation Xeon E3-1200 v5/v6 / E3-1500 v5 / 6th/7th/8th Gen Core Processor Gaussian Mixture Model [8086:1911]
+-12.0 Intel Corporation Cannon Lake PCH Thermal Controller [8086:a379]
+-14.0 Intel Corporation Cannon Lake PCH USB 3.1 xHCI Host Controller [8086:a36d]
+-14.2 Intel Corporation Cannon Lake PCH Shared SRAM [8086:a36f]
+-15.0 Intel Corporation Cannon Lake PCH Serial IO I2C Controller #0 [8086:a368]
+-15.1 Intel Corporation Cannon Lake PCH Serial IO I2C Controller #1 [8086:a369]
+-16.0 Intel Corporation Cannon Lake PCH HECI Controller [8086:a360]
+-17.0 Intel Corporation Cannon Lake Mobile PCH SATA AHCI Controller [8086:a353]
+-1b.0-[02-3a]----00.0-[03-3a]--+-00.0-[04]----00.0 Intel Corporation JHL6340 Thunderbolt 3 NHI (C step) [Alpine Ridge 2C 2016] [8086:15d9]
| +-01.0-[05-39]--
| \-02.0-[3a]----00.0 Intel Corporation JHL6340 Thunderbolt 3 USB 3.1 Controller (C step) [Alpine Ridge 2C 2016] [8086:15db]
+-1c.0-[3b]----00.0 Intel Corporation Wi-Fi 6 AX200 [8086:2723]
+-1c.4-[3c]----00.0 Realtek Semiconductor Co., Ltd. RTS525A PCI Express Card Reader [10ec:525a]
+-1d.0-[3d]----00.0 Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981/PM983 [144d:a808]
+-1f.0 Intel Corporation Cannon Lake LPC Controller [8086:a30e]
+-1f.3 Intel Corporation Cannon Lake PCH cAVS [8086:a348]
+-1f.4 Intel Corporation Cannon Lake PCH SMBus Controller [8086:a323]
\-1f.5 Intel Corporation Cannon Lake PCH SPI Controller [8086:a324]
I the newest kernel that I have tested was 6.4.0-rc4, and the problem existed in that kernel also. For now I have worked around the problem by blacklisting the nouveau driver and using the intel GPU.
I first reported this bug to my distribution and it was then forwarded to various linux kernel lists and developers. See https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1036530 In that thread Mario Limonciello suggested that the bug be reported here.
Please let me know if more information is needed.
Regards,
Nick.