intel issueshttps://gitlab.freedesktop.org/drm/intel/-/issues2024-03-27T04:36:20Zhttps://gitlab.freedesktop.org/drm/intel/-/issues/10582GPU hangs twice in a QML application2024-03-27T04:36:20ZMoody LiuGPU hangs twice in a QML application<!--
Please read this first: https://drm.pages.freedesktop.org/intel-docs/how-to-file-i915-bugs.html
Please do not use the confidential checkbox below.
-->
I have observed 2 GPU hangs when using a QML application [Qcm](https://github.c...<!--
Please read this first: https://drm.pages.freedesktop.org/intel-docs/how-to-file-i915-bugs.html
Please do not use the confidential checkbox below.
-->
I have observed 2 GPU hangs when using a QML application [Qcm](https://github.com/hypengw/Qcm), details are listed below:
```
00:02.0 VGA compatible controller: Intel Corporation Raptor Lake-S UHD Graphics (rev 04)
01:00.0 VGA compatible controller: NVIDIA Corporation AD107M [GeForce RTX 4050 Max-Q / Mobile] (rev a1)
```
```
Operating System: Arch Linux
KDE Plasma Version: 6.0.2
KDE Frameworks Version: 6.0.0
Qt Version: 6.6.2
Kernel Version: 6.8.1-arch1-1 (64-bit)
Graphics Platform: Wayland
Processors: 32 × 13th Gen Intel® Core™ i9-13900HX
Memory: 62.5 GiB of RAM
Graphics Processor: Mesa Intel® Graphics
Manufacturer: LENOVO
Product Name: 82WK
System Version: Legion Y9000P IRX8
```
[gpu.dump1](/uploads/938382478fe9d5859ba3459395dc81d7/gpu.dump1)
[gpu.dump2](/uploads/ca3b6f0a109d4eae55bd0378d22dc374/gpu.dump2)
`dmesg` says:
```
[38127.766533] i915 0000:00:02.0: [drm] *ERROR* GT0: GUC: Engine reset failed on 0:0 (rcs0) because 0x00000000
[38127.797750] i915 0000:00:02.0: [drm] GPU HANG: ecode 12:1:0020fffe, in Qcm [1141064]
[38127.797752] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[38127.797753] Please file a _new_ bug report at https://gitlab.freedesktop.org/drm/intel/issues/new.
[38127.797753] Please see https://drm.pages.freedesktop.org/intel-docs/how-to-file-i915-bugs.html for details.
[38127.797753] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[38127.797753] The GPU crash dump is required to analyze GPU hangs, so please always attach it.
[38127.797754] GPU crash dump saved to /sys/class/drm/card1/error
[38127.798018] i915 0000:00:02.0: [drm] GT0: Resetting chip for GuC failed to reset engine mask=0x1
[38127.901798] i915 0000:00:02.0: [drm] *ERROR* GT0: rcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
[38127.902514] i915 0000:00:02.0: [drm] *ERROR* GT0: rcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
[38127.902621] i915 0000:00:02.0: [drm] Qcm[1141064] context reset due to GPU hang
[38127.902676] i915 0000:00:02.0: [drm] GT0: GuC firmware i915/tgl_guc_70.bin version 70.20.0
[38127.902678] i915 0000:00:02.0: [drm] GT0: HuC firmware i915/tgl_huc.bin version 7.9.3
[38127.905673] i915 0000:00:02.0: [drm] GT0: HuC: authenticated for all workloads
[38127.906393] i915 0000:00:02.0: [drm] GT0: GUC: submission enabled
[38127.906394] i915 0000:00:02.0: [drm] GT0: GUC: SLPC enabled
[38695.598955] i915 0000:00:02.0: [drm] *ERROR* GT0: GUC: Engine reset failed on 0:0 (rcs0) because 0x00000000
[38695.629165] i915 0000:00:02.0: [drm] GPU HANG: ecode 12:1:84dffffb, in Qcm [1145063]
[38695.629438] i915 0000:00:02.0: [drm] GT0: Resetting chip for GuC failed to reset engine mask=0x1
[38695.732655] i915 0000:00:02.0: [drm] *ERROR* GT0: rcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
[38695.733365] i915 0000:00:02.0: [drm] *ERROR* GT0: rcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
[38695.733463] i915 0000:00:02.0: [drm] Qcm[1145063] context reset due to GPU hang
[38695.733506] i915 0000:00:02.0: [drm] GT0: GuC firmware i915/tgl_guc_70.bin version 70.20.0
[38695.733508] i915 0000:00:02.0: [drm] GT0: HuC firmware i915/tgl_huc.bin version 7.9.3
[38695.736168] i915 0000:00:02.0: [drm] GT0: HuC: authenticated for all workloads
[38695.737166] i915 0000:00:02.0: [drm] GT0: GUC: submission enabled
[38695.737167] i915 0000:00:02.0: [drm] GT0: GUC: SLPC enabled
```https://gitlab.freedesktop.org/drm/intel/-/issues/10581i915 GPU HANG: ecode 12:1:859ffffb, in picom [1603]2024-03-27T04:38:26ZCyannidei915 GPU HANG: ecode 12:1:859ffffb, in picom [1603][crash.dump](/uploads/45806805e74da437e94f6d6f7d176772/crash.dump)[crash.dump](/uploads/45806805e74da437e94f6d6f7d176772/crash.dump)https://gitlab.freedesktop.org/drm/intel/-/issues/10549GPU hang - FFXIV in specific instances2024-03-28T22:16:47ZSimon JenkinsGPU hang - FFXIV in specific instancesWhile playing FFXIV via [xivlauncher](https://flathub.org/apps/dev.goats.xivlauncher), if I go into specific instances it causes the GPU to hang. Desktop recovers after 5-10 seconds, but game is permanently frozen & must be killed. dmesg...While playing FFXIV via [xivlauncher](https://flathub.org/apps/dev.goats.xivlauncher), if I go into specific instances it causes the GPU to hang. Desktop recovers after 5-10 seconds, but game is permanently frozen & must be killed. dmesg displays the following:
```
[ 3317.240481] i915 0000:03:00.0: [drm] GPU HANG: ecode 12:1:85dff5fb, in ffxiv_dx11.exe [311498]
[ 3317.240485] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[ 3317.240486] Please file a _new_ bug report at https://gitlab.freedesktop.org/drm/intel/issues/new.
[ 3317.240486] Please see https://drm.pages.freedesktop.org/intel-docs/how-to-file-i915-bugs.html for details.
[ 3317.240487] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[ 3317.240487] The GPU crash dump is required to analyze GPU hangs, so please always attach it.
[ 3317.240488] GPU crash dump saved to /sys/class/drm/card1/error
```
**Reproduction steps**
* 2 screens configured in either 1080p or 4k
* Install FFXIV via [xivlauncher](https://flathub.org/apps/dev.goats.xivlauncher)
* Launch game + login
Either:
* Queue with 3 other players for Alzadaal's Legacy
OR:
* Queue with 7 other players for Anabaseios: The Twelfth Circle
Both of these instances require significant time spent in the game, so happy to follow up for testing. Will be somewhat slow to test, as I do not have prior kernel building knowledge & will need to grab other players to take part.
**Frequency:**
* Alzadaal's Legacy
* in 4k, always instantly hangs upon entering instance
* in 1080p, rarely
* Anabaseios: The Twelfth Circle
* in both resolutions, always at least once during the instance
**System Details:**
I'm currently running Endeavour OS + KDE Plasma. Everything is up-to-date at the time of writing (23/03/2024)
`uname -a`
```
Linux simon-pc 6.8.1-arch1-1 #1 SMP PREEMPT_DYNAMIC Sat, 16 Mar 2024 17:15:35 +0000 x86_64 GNU/Linux
```
`lspci -vnn -d ':*:0300'`
<details>
<summary>Click to expand</summary>
```
03:00.0 VGA compatible controller [0300]: Intel Corporation DG2 [Arc A770] [8086:56a0] (rev 08) (prog-if 00 [VGA controller])
Subsystem: ASRock Incorporation DG2 [Arc A770] [1849:6001]
Flags: bus master, fast devsel, latency 0, IRQ 159, IOMMU group 15
Memory at dd000000 (64-bit, non-prefetchable) [size=16M]
Memory at fa00000000 (64-bit, prefetchable) [size=8G]
Expansion ROM at de000000 [disabled] [size=2M]
Capabilities: [40] Vendor Specific Information: Len=0c <?>
Capabilities: [70] Express Endpoint, IntMsgNum 0
Capabilities: [ac] MSI: Enable+ Count=1/1 Maskable+ 64bit+
Capabilities: [d0] Power Management version 3
Capabilities: [100] Alternative Routing-ID Interpretation (ARI)
Capabilities: [420] Physical Resizable BAR
Capabilities: [400] Latency Tolerance Reporting
Kernel driver in use: i915
Kernel modules: i915, xe
6e:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Raphael [1002:164e] (rev c9) (prog-if 00 [VGA controller])
Subsystem: ASUSTeK Computer Inc. Raphael [1043:8877]
Flags: bus master, fast devsel, latency 0, IRQ 105, IOMMU group 23
Memory at fc70000000 (64-bit, prefetchable) [size=256M]
Memory at fc80000000 (64-bit, prefetchable) [size=2M]
I/O ports at e000 [size=256]
Memory at de800000 (32-bit, non-prefetchable) [size=512K]
Capabilities: [48] Vendor Specific Information: Len=08 <?>
Capabilities: [50] Power Management version 3
Capabilities: [64] Express Legacy Endpoint, IntMsgNum 0
Capabilities: [a0] MSI: Enable- Count=1/4 Maskable- 64bit+
Capabilities: [c0] MSI-X: Enable+ Count=4 Masked-
Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
Capabilities: [270] Secondary PCI Express
Capabilities: [2a0] Access Control Services
Capabilities: [2b0] Address Translation Service (ATS)
Capabilities: [2c0] Page Request Interface (PRI)
Capabilities: [2d0] Process Address Space ID (PASID)
Capabilities: [410] Physical Layer 16.0 GT/s <?>
Capabilities: [450] Lane Margining at the Receiver
Kernel driver in use: amdgpu
Kernel modules: amdgpu
```
</details>
`dmidecode`:
<details>
<summary>Click to expand</summary>
```
# dmidecode 3.5
Getting SMBIOS data from sysfs.
SMBIOS 3.5.0 present.
Table at 0x794B5000.
Handle 0x0000, DMI type 0, 26 bytes
BIOS Information
Vendor: American Megatrends Inc.
Version: 1905
Release Date: 02/05/2024
Address: 0xF0000
Runtime Size: 64 kB
ROM Size: 32 MB
Characteristics:
PCI is supported
BIOS is upgradeable
BIOS shadowing is allowed
Boot from CD is supported
Selectable boot is supported
BIOS ROM is socketed
EDD is supported
Japanese floppy for NEC 9800 1.2 MB is supported (int 13h)
Japanese floppy for Toshiba 1.2 MB is supported (int 13h)
5.25"/360 kB floppy services are supported (int 13h)
5.25"/1.2 MB floppy services are supported (int 13h)
3.5"/720 kB floppy services are supported (int 13h)
3.5"/2.88 MB floppy services are supported (int 13h)
Print screen service is supported (int 5h)
Serial services are supported (int 14h)
Printer services are supported (int 17h)
CGA/mono video services are supported (int 10h)
ACPI is supported
USB legacy is supported
BIOS boot specification is supported
Targeted content distribution is supported
UEFI is supported
BIOS Revision: 19.5
Handle 0x0001, DMI type 1, 27 bytes
System Information
Manufacturer: ASUS
Product Name: System Product Name
Version: System Version
Serial Number: System Serial Number
UUID: f5742f51-3ef3-0ce7-d8c6-581122adf086
Wake-up Type: Power Switch
SKU Number: SKU
Family: To be filled by O.E.M.
Handle 0x0002, DMI type 2, 15 bytes
Base Board Information
Manufacturer: ASUSTeK COMPUTER INC.
Product Name: ROG CROSSHAIR X670E HERO
Version: Rev 1.xx
Serial Number: 220808412601078
Asset Tag: Default string
Features:
Board is a hosting board
Board is replaceable
Location In Chassis: Default string
Chassis Handle: 0x0003
Type: Motherboard
Contained Object Handles: 0
Handle 0x0003, DMI type 3, 22 bytes
Chassis Information
Manufacturer: Default string
Type: Desktop
Lock: Not Present
Version: Default string
Serial Number: Default string
Asset Tag: Default string
Boot-up State: Safe
Power Supply State: Safe
Thermal State: Safe
Security Status: None
OEM Information: 0x00000000
Height: Unspecified
Number Of Power Cords: 1
Contained Elements: 0
SKU Number: Default string
Handle 0x0004, DMI type 10, 6 bytes
On Board Device Information
Type: Video
Status: Enabled
Description: To Be Filled By O.E.M.
Handle 0x0005, DMI type 11, 5 bytes
OEM Strings
String 1: Default string
String 2: Default string
String 3: BOURBON
String 4: Default string
String 5: Default string
String 6: Default string
Handle 0x0006, DMI type 12, 5 bytes
System Configuration Options
Option 1: Default string
Handle 0x0007, DMI type 32, 20 bytes
System Boot Information
Status: No errors detected
Handle 0x0008, DMI type 34, 11 bytes
Management Device
Description: Nuvoton NCT6799D-R
Type: Other
Address: 0x00000295
Address Type: I/O Port
Handle 0x0009, DMI type 40, 59 bytes
Additional Information 1
Referenced Handle: 0x0004
Referenced Offset: 0x01
String: ROG
Value: 0x00000001
Additional Information 2
Referenced Handle: 0x0004
Referenced Offset: 0x0f
String: YEAR
Value: 0x000007e6
Additional Information 3
Referenced Handle: 0x0001
Referenced Offset: 0x1a
String: Mordor 1.11
Value: 0x00000000
Additional Information 4
Referenced Handle: 0x0001
Referenced Offset: 0x1a
String: PRODUCT_LINE
Value: 0x00000000
Additional Information 5
Referenced Handle: 0x0001
Referenced Offset: 0x19
String: PRODUCT_SKU
Value: 0x00000001
Additional Information 6
Referenced Handle: 0x0001
Referenced Offset: 0x19
String: FEATURES
Value: 0x00000001
Handle 0x000A, DMI type 44, 9 bytes
Unknown Type
Header and Data:
2C 09 0A 00 FF FF 01 01 00
Handle 0x000B, DMI type 43, 31 bytes
TPM Device
Vendor ID: AMD
Specification Version: 2.0
Firmware Revision: 6.31
Description: AMD
Characteristics:
Family configurable via platform software support
OEM-specific Information: 0x00000000
Handle 0x000C, DMI type 7, 27 bytes
Cache Information
Socket Designation: L1 - Cache
Configuration: Enabled, Not Socketed, Level 1
Operational Mode: Write Back
Location: Internal
Installed Size: 1 MB
Maximum Size: 1 MB
Supported SRAM Types:
Pipeline Burst
Installed SRAM Type: Pipeline Burst
Speed: 1 ns
Error Correction Type: Multi-bit ECC
System Type: Unified
Associativity: 8-way Set-associative
Handle 0x000D, DMI type 7, 27 bytes
Cache Information
Socket Designation: L2 - Cache
Configuration: Enabled, Not Socketed, Level 2
Operational Mode: Write Back
Location: Internal
Installed Size: 16 MB
Maximum Size: 16 MB
Supported SRAM Types:
Pipeline Burst
Installed SRAM Type: Pipeline Burst
Speed: 1 ns
Error Correction Type: Multi-bit ECC
System Type: Unified
Associativity: 8-way Set-associative
Handle 0x000E, DMI type 7, 27 bytes
Cache Information
Socket Designation: L3 - Cache
Configuration: Enabled, Not Socketed, Level 3
Operational Mode: Write Back
Location: Internal
Installed Size: 128 MB
Maximum Size: 128 MB
Supported SRAM Types:
Pipeline Burst
Installed SRAM Type: Pipeline Burst
Speed: 1 ns
Error Correction Type: Multi-bit ECC
System Type: Unified
Associativity: 16-way Set-associative
Handle 0x000F, DMI type 4, 48 bytes
Processor Information
Socket Designation: AM5
Type: Central Processor
Family: Zen
Manufacturer: Advanced Micro Devices, Inc.
ID: 12 0F A6 00 FF FB 8B 17
Signature: Family 25, Model 97, Stepping 2
Flags:
FPU (Floating-point unit on-chip)
VME (Virtual mode extension)
DE (Debugging extension)
PSE (Page size extension)
TSC (Time stamp counter)
MSR (Model specific registers)
PAE (Physical address extension)
MCE (Machine check exception)
CX8 (CMPXCHG8 instruction supported)
APIC (On-chip APIC hardware supported)
SEP (Fast system call)
MTRR (Memory type range registers)
PGE (Page global enable)
MCA (Machine check architecture)
CMOV (Conditional move instruction supported)
PAT (Page attribute table)
PSE-36 (36-bit page size extension)
CLFSH (CLFLUSH instruction supported)
MMX (MMX technology supported)
FXSR (FXSAVE and FXSTOR instructions supported)
SSE (Streaming SIMD extensions)
SSE2 (Streaming SIMD extensions 2)
HTT (Multi-threading)
Version: AMD Ryzen 9 7950X3D 16-Core Processor
Voltage: 1.3 V
External Clock: 100 MHz
Max Speed: 5750 MHz
Current Speed: 4200 MHz
Status: Populated, Enabled
Upgrade: <OUT OF SPEC>
L1 Cache Handle: 0x000C
L2 Cache Handle: 0x000D
L3 Cache Handle: 0x000E
Serial Number: Unknown
Asset Tag: Unknown
Part Number: Unknown
Core Count: 16
Core Enabled: 16
Thread Count: 32
Characteristics:
64-bit capable
Multi-Core
Hardware Thread
Execute Protection
Enhanced Virtualization
Power/Performance Control
Handle 0x0010, DMI type 44, 9 bytes
Unknown Type
Header and Data:
2C 09 10 00 0F 00 01 02 00
Handle 0x0011, DMI type 18, 23 bytes
32-bit Memory Error Information
Type: OK
Granularity: Unknown
Operation: Unknown
Vendor Syndrome: Unknown
Memory Array Address: Unknown
Device Address: Unknown
Resolution: Unknown
Handle 0x0012, DMI type 16, 23 bytes
Physical Memory Array
Location: System Board Or Motherboard
Use: System Memory
Error Correction Type: None
Maximum Capacity: 128 GB
Error Information Handle: 0x0011
Number Of Devices: 4
Handle 0x0013, DMI type 19, 31 bytes
Memory Array Mapped Address
Starting Address: 0x00000000000
Ending Address: 0x00FFFFFFFFF
Range Size: 64 GB
Physical Array Handle: 0x0012
Partition Width: 2
Handle 0x0014, DMI type 18, 23 bytes
32-bit Memory Error Information
Type: OK
Granularity: Unknown
Operation: Unknown
Vendor Syndrome: Unknown
Memory Array Address: Unknown
Device Address: Unknown
Resolution: Unknown
Handle 0x0015, DMI type 17, 92 bytes
Memory Device
Array Handle: 0x0012
Error Information Handle: 0x0014
Total Width: Unknown
Data Width: Unknown
Size: No Module Installed
Form Factor: Unknown
Set: None
Locator: DIMM 0
Bank Locator: P0 CHANNEL A
Type: Unknown
Type Detail: Unknown
Handle 0x0016, DMI type 18, 23 bytes
32-bit Memory Error Information
Type: OK
Granularity: Unknown
Operation: Unknown
Vendor Syndrome: Unknown
Memory Array Address: Unknown
Device Address: Unknown
Resolution: Unknown
Handle 0x0017, DMI type 17, 92 bytes
Memory Device
Array Handle: 0x0012
Error Information Handle: 0x0016
Total Width: 64 bits
Data Width: 64 bits
Size: 32 GB
Form Factor: DIMM
Set: None
Locator: DIMM 1
Bank Locator: P0 CHANNEL A
Type: DDR5
Type Detail: Synchronous Unbuffered (Unregistered)
Speed: 4800 MT/s
Manufacturer: Corsair
Serial Number: 00000000
Asset Tag: Not Specified
Part Number: CMT64GX5M2B5600Z40
Rank: 2
Configured Memory Speed: 4800 MT/s
Minimum Voltage: 1.1 V
Maximum Voltage: 1.1 V
Configured Voltage: 1.1 V
Memory Technology: DRAM
Memory Operating Mode Capability: Volatile memory
Firmware Version: Unknown
Module Manufacturer ID: Bank 3, Hex 0x9E
Module Product ID: Unknown
Memory Subsystem Controller Manufacturer ID: Unknown
Memory Subsystem Controller Product ID: Unknown
Non-Volatile Size: None
Volatile Size: 32 GB
Cache Size: None
Logical Size: None
Handle 0x0018, DMI type 20, 35 bytes
Memory Device Mapped Address
Starting Address: 0x00000000000
Ending Address: 0x007FFFFFFFF
Range Size: 32 GB
Physical Device Handle: 0x0017
Memory Array Mapped Address Handle: 0x0013
Partition Row Position: Unknown
Interleave Position: Unknown
Interleaved Data Depth: Unknown
Handle 0x0019, DMI type 18, 23 bytes
32-bit Memory Error Information
Type: OK
Granularity: Unknown
Operation: Unknown
Vendor Syndrome: Unknown
Memory Array Address: Unknown
Device Address: Unknown
Resolution: Unknown
Handle 0x001A, DMI type 17, 92 bytes
Memory Device
Array Handle: 0x0012
Error Information Handle: 0x0019
Total Width: Unknown
Data Width: Unknown
Size: No Module Installed
Form Factor: Unknown
Set: None
Locator: DIMM 0
Bank Locator: P0 CHANNEL B
Type: Unknown
Type Detail: Unknown
Handle 0x001B, DMI type 18, 23 bytes
32-bit Memory Error Information
Type: OK
Granularity: Unknown
Operation: Unknown
Vendor Syndrome: Unknown
Memory Array Address: Unknown
Device Address: Unknown
Resolution: Unknown
Handle 0x001C, DMI type 17, 92 bytes
Memory Device
Array Handle: 0x0012
Error Information Handle: 0x001B
Total Width: 64 bits
Data Width: 64 bits
Size: 32 GB
Form Factor: DIMM
Set: None
Locator: DIMM 1
Bank Locator: P0 CHANNEL B
Type: DDR5
Type Detail: Synchronous Unbuffered (Unregistered)
Speed: 4800 MT/s
Manufacturer: Corsair
Serial Number: 00000000
Asset Tag: Not Specified
Part Number: CMT64GX5M2B5600Z40
Rank: 2
Configured Memory Speed: 4800 MT/s
Minimum Voltage: 1.1 V
Maximum Voltage: 1.1 V
Configured Voltage: 1.1 V
Memory Technology: DRAM
Memory Operating Mode Capability: Volatile memory
Firmware Version: Unknown
Module Manufacturer ID: Bank 3, Hex 0x9E
Module Product ID: Unknown
Memory Subsystem Controller Manufacturer ID: Unknown
Memory Subsystem Controller Product ID: Unknown
Non-Volatile Size: None
Volatile Size: 32 GB
Cache Size: None
Logical Size: None
Handle 0x001D, DMI type 20, 35 bytes
Memory Device Mapped Address
Starting Address: 0x00800000000
Ending Address: 0x00FFFFFFFFF
Range Size: 32 GB
Physical Device Handle: 0x001C
Memory Array Mapped Address Handle: 0x0013
Partition Row Position: Unknown
Interleave Position: Unknown
Interleaved Data Depth: Unknown
Handle 0x001E, DMI type 40, 14 bytes
Additional Information 1
Referenced Handle: 0x0000
Referenced Offset: 0x05
String: AGESA!V9 ComboAM5PI 1.1.0.2b
Value: 0x00000000
Handle 0x001F, DMI type 60, 165 bytes
Unknown Type
Header and Data:
3C A5 1F 00 05 1F 67 97 B3 63 E5 4A 5B F0 BA 0A
3F 61 59 0B F4 AD 54 4A 44 80 31 1D 30 99 56 1A
4C D0 E4 57 F4 1A 36 7E D1 F6 45 17 BC 57 75 CC
CF 64 6B 6D D4 A9 19 91 88 7F 58 DF 31 00 DF 32
B0 6A 9B 3A C1 92 EA 8D AA FD DB E8 A2 49 66 B2
18 D5 FE B3 A8 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00
Handle 0x0020, DMI type 8, 9 bytes
Port Connector Information
Internal Reference Designator: U32G2_C3
Internal Connector Type: None
External Reference Designator: U32G2_C3
External Connector Type: USB Type-C Receptacle
Port Type: USB
Handle 0x0021, DMI type 8, 9 bytes
Port Connector Information
Internal Reference Designator: U32G2_23
Internal Connector Type: None
External Reference Designator: U32G2_23
External Connector Type: Access Bus (USB)
Port Type: USB
Handle 0x0022, DMI type 8, 9 bytes
Port Connector Information
Internal Reference Designator: U32G2_4
Internal Connector Type: None
External Reference Designator: U32G2_4
External Connector Type: Access Bus (USB)
Port Type: USB
Handle 0x0023, DMI type 8, 9 bytes
Port Connector Information
Internal Reference Designator: U32G2_8910
Internal Connector Type: None
External Reference Designator: U32G2_8910
External Connector Type: Access Bus (USB)
Port Type: USB
Handle 0x0024, DMI type 8, 9 bytes
Port Connector Information
Internal Reference Designator: U32G2X2_C18
Internal Connector Type: None
External Reference Designator: U32G2X2_C18
External Connector Type: USB Type-C Receptacle
Port Type: USB
Handle 0x0025, DMI type 8, 9 bytes
Port Connector Information
Internal Reference Designator: U32G2_20-22
Internal Connector Type: None
External Reference Designator: U32G2_20-22
External Connector Type: Access Bus (USB)
Port Type: USB
Handle 0x0026, DMI type 8, 9 bytes
Port Connector Information
Internal Reference Designator: USB4_EC1
Internal Connector Type: None
External Reference Designator: USB4_EC1
External Connector Type: USB Type-C Receptacle
Port Type: USB
Handle 0x0027, DMI type 8, 9 bytes
Port Connector Information
Internal Reference Designator: USB4_EC2
Internal Connector Type: None
External Reference Designator: USB4_EC2
External Connector Type: USB Type-C Receptacle
Port Type: USB
Handle 0x0028, DMI type 8, 9 bytes
Port Connector Information
Internal Reference Designator: LAN
Internal Connector Type: None
External Reference Designator: LAN
External Connector Type: RJ-45
Port Type: Network Port
Handle 0x0029, DMI type 8, 9 bytes
Port Connector Information
Internal Reference Designator: M.2(WIFI)
Internal Connector Type: None
External Reference Designator: M.2(WIFI)
External Connector Type: Other
Port Type: Network Port
Handle 0x002A, DMI type 8, 9 bytes
Port Connector Information
Internal Reference Designator: HDMI
Internal Connector Type: None
External Reference Designator: HDMI port
External Connector Type: Other
Port Type: Video Port
Handle 0x002B, DMI type 8, 9 bytes
Port Connector Information
Internal Reference Designator: AUDIO
Internal Connector Type: None
External Reference Designator: Audio Jack
External Connector Type: Mini Jack (headphones)
Port Type: Audio Port
Handle 0x002C, DMI type 8, 9 bytes
Port Connector Information
Internal Reference Designator: SATA6G_12
Internal Connector Type: SAS/SATA Plug Receptacle
External Reference Designator: Not Specified
External Connector Type: None
Port Type: SATA
Handle 0x002D, DMI type 8, 9 bytes
Port Connector Information
Internal Reference Designator: SATA6G_34
Internal Connector Type: SAS/SATA Plug Receptacle
External Reference Designator: Not Specified
External Connector Type: None
Port Type: SATA
Handle 0x002E, DMI type 8, 9 bytes
Port Connector Information
Internal Reference Designator: SATA6G_E12
Internal Connector Type: SAS/SATA Plug Receptacle
External Reference Designator: Not Specified
External Connector Type: None
Port Type: SATA
Handle 0x002F, DMI type 8, 9 bytes
Port Connector Information
Internal Reference Designator: M.2_1(SOCKET3)
Internal Connector Type: Other
External Reference Designator: Not Specified
External Connector Type: None
Port Type: Other
Handle 0x0030, DMI type 8, 9 bytes
Port Connector Information
Internal Reference Designator: M.2_2(SOCKET3)
Internal Connector Type: Other
External Reference Designator: Not Specified
External Connector Type: None
Port Type: Other
Handle 0x0031, DMI type 8, 9 bytes
Port Connector Information
Internal Reference Designator: M.2_3(SOCKET3)
Internal Connector Type: Other
External Reference Designator: Not Specified
External Connector Type: None
Port Type: Other
Handle 0x0032, DMI type 8, 9 bytes
Port Connector Information
Internal Reference Designator: M.2_4(SOCKET3)
Internal Connector Type: Other
External Reference Designator: Not Specified
External Connector Type: None
Port Type: Other
Handle 0x0033, DMI type 8, 9 bytes
Port Connector Information
Internal Reference Designator: U32G2X2_C6
Internal Connector Type: USB Type-C Receptacle
External Reference Designator: Not Specified
External Connector Type: None
Port Type: USB
Handle 0x0034, DMI type 8, 9 bytes
Port Connector Information
Internal Reference Designator: USB_P12_13
Internal Connector Type: Access Bus (USB)
External Reference Designator: Not Specified
External Connector Type: None
Port Type: USB
Handle 0x0035, DMI type 8, 9 bytes
Port Connector Information
Internal Reference Designator: USB_P14_15
Internal Connector Type: Access Bus (USB)
External Reference Designator: Not Specified
External Connector Type: None
Port Type: USB
Handle 0x0036, DMI type 8, 9 bytes
Port Connector Information
Internal Reference Designator: USB1617
Internal Connector Type: Access Bus (USB)
External Reference Designator: Not Specified
External Connector Type: None
Port Type: USB
Handle 0x0037, DMI type 8, 9 bytes
Port Connector Information
Internal Reference Designator: U32G1_E12
Internal Connector Type: Access Bus (USB)
External Reference Designator: Not Specified
External Connector Type: None
Port Type: USB
Handle 0x0038, DMI type 8, 9 bytes
Port Connector Information
Internal Reference Designator: U32G1_E34
Internal Connector Type: Access Bus (USB)
External Reference Designator: Not Specified
External Connector Type: None
Port Type: USB
Handle 0x0039, DMI type 8, 9 bytes
Port Connector Information
Internal Reference Designator: AAFP
Internal Connector Type: Mini Jack (headphones)
External Reference Designator: Not Specified
External Connector Type: None
Port Type: Audio Port
Handle 0x003A, DMI type 8, 9 bytes
Port Connector Information
Internal Reference Designator: CPU_FAN
Internal Connector Type: Other
External Reference Designator: Not Specified
External Connector Type: None
Port Type: Other
Handle 0x003B, DMI type 8, 9 bytes
Port Connector Information
Internal Reference Designator: CPU_OPT
Internal Connector Type: Other
External Reference Designator: Not Specified
External Connector Type: None
Port Type: Other
Handle 0x003C, DMI type 8, 9 bytes
Port Connector Information
Internal Reference Designator: CHA_FAN1P
Internal Connector Type: Other
External Reference Designator: Not Specified
External Connector Type: None
Port Type: Other
Handle 0x003D, DMI type 8, 9 bytes
Port Connector Information
Internal Reference Designator: CHA_FAN2P
Internal Connector Type: Other
External Reference Designator: Not Specified
External Connector Type: None
Port Type: Other
Handle 0x003E, DMI type 8, 9 bytes
Port Connector Information
Internal Reference Designator: CHA_FAN3P
Internal Connector Type: Other
External Reference Designator: Not Specified
External Connector Type: None
Port Type: Other
Handle 0x003F, DMI type 8, 9 bytes
Port Connector Information
Internal Reference Designator: CHA_FAN4
Internal Connector Type: Other
External Reference Designator: Not Specified
External Connector Type: None
Port Type: Other
Handle 0x0040, DMI type 8, 9 bytes
Port Connector Information
Internal Reference Designator: W_PUMP+
Internal Connector Type: Other
External Reference Designator: Not Specified
External Connector Type: None
Port Type: Other
Handle 0x0041, DMI type 8, 9 bytes
Port Connector Information
Internal Reference Designator: AIO_PUMP
Internal Connector Type: Other
External Reference Designator: Not Specified
External Connector Type: None
Port Type: Other
Handle 0x0042, DMI type 8, 9 bytes
Port Connector Information
Internal Reference Designator: W_IN
Internal Connector Type: Other
External Reference Designator: Not Specified
External Connector Type: None
Port Type: Other
Handle 0x0043, DMI type 8, 9 bytes
Port Connector Information
Internal Reference Designator: W_OUT
Internal Connector Type: Other
External Reference Designator: Not Specified
External Connector Type: None
Port Type: Other
Handle 0x0044, DMI type 8, 9 bytes
Port Connector Information
Internal Reference Designator: W_FLOW
Internal Connector Type: Other
External Reference Designator: Not Specified
External Connector Type: None
Port Type: Other
Handle 0x0045, DMI type 8, 9 bytes
Port Connector Information
Internal Reference Designator: T_SENSOR
Internal Connector Type: Other
External Reference Designator: Not Specified
External Connector Type: None
Port Type: Other
Handle 0x0046, DMI type 8, 9 bytes
Port Connector Information
Internal Reference Designator: F_PANEL
Internal Connector Type: Other
External Reference Designator: Not Specified
External Connector Type: None
Port Type: Other
Handle 0x0047, DMI type 8, 9 bytes
Port Connector Information
Internal Reference Designator: RGB_HEADER
Internal Connector Type: Other
External Reference Designator: Not Specified
External Connector Type: None
Port Type: Other
Handle 0x0048, DMI type 8, 9 bytes
Port Connector Information
Internal Reference Designator: ADD_GEN2_1
Internal Connector Type: Other
External Reference Designator: Not Specified
External Connector Type: None
Port Type: Other
Handle 0x0049, DMI type 8, 9 bytes
Port Connector Information
Internal Reference Designator: ADD_GEN2_2
Internal Connector Type: Other
External Reference Designator: Not Specified
External Connector Type: None
Port Type: Other
Handle 0x004A, DMI type 8, 9 bytes
Port Connector Information
Internal Reference Designator: ADD_GEN2_3
Internal Connector Type: Other
External Reference Designator: Not Specified
External Connector Type: None
Port Type: Other
Handle 0x004B, DMI type 8, 9 bytes
Port Connector Information
Internal Reference Designator: LED_CON1
Internal Connector Type: Other
External Reference Designator: Not Specified
External Connector Type: None
Port Type: Other
Handle 0x004C, DMI type 8, 9 bytes
Port Connector Information
Internal Reference Designator: OSC_SENSE
Internal Connector Type: Other
External Reference Designator: Not Specified
External Connector Type: None
Port Type: Other
Handle 0x004D, DMI type 9, 17 bytes
System Slot Information
Designation: PCIEX16_1
Type: x16 PCI Express 5 x16
Current Usage: In Use
Length: Long
ID: 0
Characteristics:
3.3 V is provided
PME signal is supported
Bus Address: 0000:00:01.1
Handle 0x004E, DMI type 9, 17 bytes
System Slot Information
Designation: PCIEX16_2
Type: x8 PCI Express 5 x16
Current Usage: Available
Length: Long
ID: 1
Characteristics:
3.3 V is provided
PME signal is supported
Bus Address: 0000:00:1f.7
Handle 0x004F, DMI type 9, 17 bytes
System Slot Information
Designation: PCIEX1
Type: x1 PCI Express 4 x1
Current Usage: Available
Length: Other
ID: 2
Characteristics:
3.3 V is provided
PME signal is supported
Bus Address: 0000:00:1f.7
Handle 0x0050, DMI type 9, 17 bytes
System Slot Information
Designation: M.2_1(SOCKET3)
Type: x4 M.2 Socket 3
Current Usage: In Use
Length: Long
Characteristics:
3.3 V is provided
PME signal is supported
Bus Address: 0000:05:00.0
Handle 0x0051, DMI type 9, 17 bytes
System Slot Information
Designation: M.2_2(SOCKET3)
Type: x4 M.2 Socket 3
Current Usage: Available
Length: Long
Characteristics:
3.3 V is provided
PME signal is supported
Bus Address: 0000:00:1f.7
Handle 0x0052, DMI type 9, 17 bytes
System Slot Information
Designation: M.2_3(SOCKET3)
Type: x4 M.2 Socket 3
Current Usage: Available
Length: Long
Characteristics:
3.3 V is provided
PME signal is supported
Bus Address: 0000:00:1f.7
Handle 0x0053, DMI type 9, 17 bytes
System Slot Information
Designation: M.2_4(SOCKET3)
Type: x4 M.2 Socket 3
Current Usage: Available
Length: Long
Characteristics:
3.3 V is provided
PME signal is supported
Bus Address: 0000:00:1f.7
Handle 0x0054, DMI type 9, 17 bytes
System Slot Information
Designation: M.2(WIFI)
Type: x1 M.2 Socket 1-SD
Current Usage: In Use
Length: Other
Characteristics:
3.3 V is provided
PME signal is supported
Bus Address: 0000:0b:00.0
Handle 0x0055, DMI type 41, 11 bytes
Onboard Device
Reference Designation: Intel I225 2.5G LAN
Type: Ethernet
Status: Enabled
Type Instance: 1
Bus Address: 0000:0c:00.0
Handle 0x0056, DMI type 41, 11 bytes
Onboard Device
Reference Designation: PROM21 SATA AHCI Controller
Type: SATA Controller
Status: Enabled
Type Instance: 1
Bus Address: 0000:6d:00.0
Handle 0x0057, DMI type 41, 11 bytes
Onboard Device
Reference Designation: ASM1061 SATA AHCI Controller
Type: SATA Controller
Status: Enabled
Type Instance: 2
Bus Address: 0000:0e:00.0
Handle 0x0058, DMI type 13, 22 bytes
BIOS Language Information
Language Description Format: Long
Installable Languages: 9
en|US|iso8859-1
fr|FR|iso8859-1
zh|TW|unicode
zh|CN|unicode
ja|JP|unicode
de|DE|iso8859-1
es|ES|iso8859-1
ru|RU|iso8859-5
ko|KR|unicode
Currently Installed Language: en|US|iso8859-1
Handle 0x0059, DMI type 127, 4 bytes
End Of Table
```
</details>
[Full dmesg output since boot](/uploads/7af824e6bdb8f2d5f2499df1f8ca2e32/dmesg)
[/sys/class/drm/card1/error](/uploads/19e007837e25b62d85bf6244144db986/error.bz2)
EDIT: clarification on oshttps://gitlab.freedesktop.org/drm/intel/-/issues/10548Engine reset failed on 0:0 -> GPU hang2024-03-29T04:57:16ZLogNEngine reset failed on 0:0 -> GPU hangI was replicating this issue while playing Minecraft, and it occurred around 5 minutes into playing every single time I'd play all of the sudden after updating my system.
This is an Acer Swift 3
uname -a: `Linux spelling-is-fun 6.8.1-...I was replicating this issue while playing Minecraft, and it occurred around 5 minutes into playing every single time I'd play all of the sudden after updating my system.
This is an Acer Swift 3
uname -a: `Linux spelling-is-fun 6.8.1-arch1-1 #1 SMP PREEMPT_DYNAMIC Sat, 16 Mar 2024 17:15:35 +0000 x86_64 GNU/Linux`
`lspci -vnn -d :*:0300`:
```
0000:00:02.0 VGA compatible controller [0300]: Intel Corporation Alder Lake-P GT2 [Iris Xe Graphics] [8086:46a6] (rev 0c) (prog-if 00 [VGA controller])
Subsystem: Acer Incorporated [ALI] Alder Lake-P GT2 [Iris Xe Graphics] [1025:1612]
Flags: bus master, fast devsel, latency 0, IRQ 161, IOMMU group 1
Memory at 601f000000 (64-bit, non-prefetchable) [size=16M]
Memory at 4000000000 (64-bit, prefetchable) [size=256M]
I/O ports at 3000 [size=64]
Expansion ROM at 000c0000 [virtual] [disabled] [size=128K]
Capabilities: <access denied>
Kernel driver in use: i915
Kernel modules: i915, xe
```
Kernel logs show:
```
Mar 23 09:12:30 spelling-is-fun kernel: i915 0000:00:02.0: [drm] *ERROR* GT0: GUC: Engine reset failed on 0:0 (rcs0) because 0x00000000
Mar 23 09:12:30 spelling-is-fun kernel: i915 0000:00:02.0: [drm] GPU HANG: ecode 12:1:84dffffb, in Render thread [3775]
Mar 23 09:12:30 spelling-is-fun kernel: GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
Mar 23 09:12:30 spelling-is-fun kernel: Please file a _new_ bug report at https://gitlab.freedesktop.org/drm/intel/issues/new.
Mar 23 09:12:30 spelling-is-fun kernel: Please see https://drm.pages.freedesktop.org/intel-docs/how-to-file-i915-bugs.html for details.
Mar 23 09:12:30 spelling-is-fun kernel: drm/i915 developers can then reassign to the right component if it's not a kernel issue.
Mar 23 09:12:30 spelling-is-fun kernel: The GPU crash dump is required to analyze GPU hangs, so please always attach it.
Mar 23 09:12:30 spelling-is-fun kernel: GPU crash dump saved to /sys/class/drm/card1/error
Mar 23 09:12:30 spelling-is-fun kernel: i915 0000:00:02.0: [drm] GT0: Resetting chip for GuC failed to reset engine mask=0x1
Mar 23 09:12:30 spelling-is-fun kernel: i915 0000:00:02.0: [drm] *ERROR* GT0: rcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
Mar 23 09:12:30 spelling-is-fun kernel: i915 0000:00:02.0: [drm] *ERROR* GT0: rcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
Mar 23 09:12:30 spelling-is-fun kernel: i915 0000:00:02.0: [drm] Render thread[3775] context reset due to GPU hang
Mar 23 09:12:30 spelling-is-fun kernel: i915 0000:00:02.0: [drm] GT0: GuC firmware i915/adlp_guc_70.bin version 70.20.0
Mar 23 09:12:30 spelling-is-fun kernel: i915 0000:00:02.0: [drm] GT0: HuC firmware i915/tgl_huc.bin version 7.9.3
Mar 23 09:12:30 spelling-is-fun kernel: i915 0000:00:02.0: [drm] GT0: HuC: authenticated for all workloads
Mar 23 09:12:30 spelling-is-fun kernel: i915 0000:00:02.0: [drm] GT0: GUC: submission enabled
Mar 23 09:12:30 spelling-is-fun kernel: i915 0000:00:02.0: [drm] GT0: GUC: SLPC enabled
Mar 23 09:12:30 spelling-is-fun plasmashell[1438]: QRhiGles2: Context is lost.
Mar 23 09:12:30 spelling-is-fun plasmashell[1438]: Graphics device lost, cleaning up scenegraph and releasing RHI
```
Reading the error dump after a reboot simply shows "No error state collected." If my system crashes how am I supposed to read it?https://gitlab.freedesktop.org/drm/intel/-/issues/10547GPU Hang - Desktop freezes in Kwin + Wayland + Intel GPU2024-03-26T15:07:56ZHigor SilvaGPU Hang - Desktop freezes in Kwin + Wayland + Intel GPUI'm running Arch Linux + Linux 6.8.1-arch1-1 + KDE Plasma + Wayland. Basically my desktop completely freezes for about 5~10 seconds with 0 responses from my mouse or keyboard.
The GPU crash dump shows me this [error](/uploads/f18ce8e738...I'm running Arch Linux + Linux 6.8.1-arch1-1 + KDE Plasma + Wayland. Basically my desktop completely freezes for about 5~10 seconds with 0 responses from my mouse or keyboard.
The GPU crash dump shows me this [error](/uploads/f18ce8e738e192dc06825f694051d325/error.txt)
When I run "journalctl" command I can see this:
```
mar 23 10:34:26 arch-laptop kwin_wayland[3696]: kwin_wayland_drm: Pageflip timed out! This is a kernel bug
mar 23 10:34:29 arch-laptop PackageKit[3492]: daemon quit
mar 23 10:34:29 arch-laptop systemd[1]: packagekit.service: Deactivated successfully.
mar 23 10:34:29 arch-laptop wpa_supplicant[617]: wlan0: CTRL-EVENT-SIGNAL-CHANGE above=1 signal=-52 noise=-115 txrate=0
mar 23 10:34:31 arch-laptop kwin_wayland[3696]: kwin_wayland_drm: Pageflip timed out! This is a kernel bug
mar 23 10:34:31 arch-laptop kernel: i915 0000:00:02.0: [drm] GPU HANG: ecode 8:0:00000000
mar 23 10:34:31 arch-laptop kernel: GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
mar 23 10:34:31 arch-laptop kernel: Please file a _new_ bug report at https://gitlab.freedesktop.org/drm/intel/issues/new.
mar 23 10:34:31 arch-laptop kernel: Please see https://drm.pages.freedesktop.org/intel-docs/how-to-file-i915-bugs.html for details.
mar 23 10:34:31 arch-laptop kernel: drm/i915 developers can then reassign to the right component if it's not a kernel issue.
mar 23 10:34:31 arch-laptop kernel: The GPU crash dump is required to analyze GPU hangs, so please always attach it.
mar 23 10:34:31 arch-laptop kernel: GPU crash dump saved to /sys/class/drm/card1/error
mar 23 10:34:31 arch-laptop kernel: i915 0000:00:02.0: [drm] Resetting rcs0 for stopped heartbeat on rcs0
```
Running `uname -a`
```
Linux arch-laptop 6.8.1-arch1-1 #1 SMP PREEMPT_DYNAMIC Sat, 16 Mar 2024 17:15:35 +0000 x86_64 GNU/Linux
```
PCI device information:
```
00:02.0 VGA compatible controller [0300]: Intel Corporation Atom/Celeron/Pentium Processor x5-E8000/J3xxx/N3xxx Integrated Graphics Controller [8086:22b1] (rev 35) (prog-if 00 [VGA controller])
Subsystem: Lenovo Atom/Celeron/Pentium Processor x5-E8000/J3xxx/N3xxx Integrated Graphics Controller [17aa:3809]
Flags: bus master, fast devsel, latency 0, IRQ 124
Memory at 90000000 (64-bit, non-prefetchable) [size=16M]
Memory at 80000000 (64-bit, prefetchable) [size=256M]
I/O ports at 2000 [size=64]
Expansion ROM at 000c0000 [virtual] [disabled] [size=128K]
Capabilities: [d0] Power Management version 2
Capabilities: [90] MSI: Enable+ Count=1/1 Maskable- 64bit-
Capabilities: [b0] Vendor Specific Information: Len=07 <?>
Kernel driver in use: i915
Kernel modules: i915
```https://gitlab.freedesktop.org/drm/intel/-/issues/10536GPU hang when playing Terraria: ecode 12:1:84dffffb, in Main Thread [129278]2024-03-29T04:56:38ZMMK21GPU hang when playing Terraria: ecode 12:1:84dffffb, in Main Thread [129278]Playing Terraria (launched through Steam) often causes my system to freeze after a few minutes (or sometimes more like 30 mins) of playing, with a GPU HANG message in the system log. Here's the relevant part:
```
Mar 23 08:45:03 mish-ar...Playing Terraria (launched through Steam) often causes my system to freeze after a few minutes (or sometimes more like 30 mins) of playing, with a GPU HANG message in the system log. Here's the relevant part:
```
Mar 23 08:45:03 mish-arch kwin_wayland[836]: kwin_wayland_drm: Pageflip timed out! This is a kernel bug
Mar 23 08:45:06 mish-arch kernel: i915 0000:00:02.0: [drm] Resetting rcs0 for preemption time out
Mar 23 08:45:06 mish-arch kernel: i915 0000:00:02.0: [drm] *ERROR* GT0: rcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
Mar 23 08:45:06 mish-arch kernel: i915 0000:00:02.0: [drm] GPU HANG: ecode 12:1:84dffffb, in Main Thread [129278]
Mar 23 08:45:06 mish-arch kernel: GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
Mar 23 08:45:06 mish-arch kernel: Please file a _new_ bug report at https://gitlab.freedesktop.org/drm/intel/issues/new.
Mar 23 08:45:06 mish-arch kernel: Please see https://drm.pages.freedesktop.org/intel-docs/how-to-file-i915-bugs.html for details.
Mar 23 08:45:06 mish-arch kernel: drm/i915 developers can then reassign to the right component if it's not a kernel issue.
Mar 23 08:45:06 mish-arch kernel: The GPU crash dump is required to analyze GPU hangs, so please always attach it.
Mar 23 08:45:06 mish-arch kernel: GPU crash dump saved to /sys/class/drm/card1/error
Mar 23 08:45:08 mish-arch kwin_wayland[836]: kwin_wayland_drm: Pageflip timed out! This is a kernel bug
Mar 23 08:45:18 mish-arch kernel: Fence expiration time out i915-0000:00:02.0:Main Thread[129278]:6d516!
Mar 23 08:45:18 mish-arch kernel: Fence expiration time out i915-0000:00:02.0:kwin_wayland[836]:833298!
Mar 23 08:45:18 mish-arch kernel: Fence expiration time out i915-0000:00:02.0:Main Thread[129278]:6d518!
Mar 23 08:45:18 mish-arch kernel: Fence expiration time out i915-0000:00:02.0:QSGRenderThread[128275]:58d4!
Mar 23 08:45:18 mish-arch kernel: Fence expiration time out i915-0000:00:02.0:QSGRenderThread[128275]:58d6!
Mar 23 08:45:18 mish-arch kernel: Fence expiration time out i915-0000:00:02.0:Main Thread[129278]:6d51c!
Mar 23 08:45:18 mish-arch kernel: Fence expiration time out i915-0000:00:02.0:Main Thread[129278]:6d51a!
Mar 23 08:45:18 mish-arch kernel: Fence expiration time out i915-0000:00:02.0:steamwebhelper[2134]:8afed2!
Mar 23 08:45:18 mish-arch kernel: Fence expiration time out i915-0000:00:02.0:steamwebhelper[2134]:8afed4!
Mar 23 08:45:27 mish-arch kernel: i915 0000:00:02.0: [drm] GPU HANG: ecode 12:1:84dffffb, in Main Thread [129278]
Mar 23 08:45:27 mish-arch kernel: i915 0000:00:02.0: [drm] Resetting rcs0 for stopped heartbeat on rcs0
Mar 23 08:45:27 mish-arch kernel: i915 0000:00:02.0: [drm] *ERROR* GT0: rcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
Mar 23 08:45:27 mish-arch kernel: i915 0000:00:02.0: [drm] GT0: Resetting chip for stopped heartbeat on rcs0
Mar 23 08:45:27 mish-arch kernel: i915 0000:00:02.0: [drm] *ERROR* GT0: rcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
Mar 23 08:45:27 mish-arch kernel: i915 0000:00:02.0: [drm] *ERROR* GT0: rcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
Mar 23 08:45:27 mish-arch kernel: i915 0000:00:02.0: [drm] Main Thread[129278] context reset due to GPU hang
Mar 23 08:45:27 mish-arch kernel: i915 0000:00:02.0: [drm] GT0: GuC firmware i915/tgl_guc_70.bin version 70.20.0
Mar 23 08:45:27 mish-arch kernel: i915 0000:00:02.0: [drm] GT0: HuC firmware i915/tgl_huc.bin version 7.9.3
Mar 23 08:45:28 mish-arch kernel: i915 0000:00:02.0: [drm] GT0: HuC: authenticated for all workloads
Mar 23 08:45:28 mish-arch kernel: i915 0000:00:02.0: [drm] GT0: GUC: submission disabled
Mar 23 08:45:28 mish-arch kernel: i915 0000:00:02.0: [drm] GT0: GUC: SLPC disabled
Mar 23 08:45:29 mish-arch kernel: sched: RT throttling activated
```
I've uploaded the GPU crash dump to [card1-error.txt](/uploads/4dde0ac5ea086655524512f8709cbc6d/card1-error.txt)
### System info
- `uname`: Linux 6.8.1-arch1-1 \#1 SMP PREEMPT_DYNAMIC Sat, 16 Mar 2024 17:15:35 +0000 x86_64 GNU/Linux
- Motherboard: MSI PRO H610M-G DDR4 (MS-7D46)
- `dmesg` debug output: [dmesg.log](/uploads/a14b5d0842a4beb34d79d677cc839d6a/dmesg.log)
- Note that not all log files are from the same occurrence of the issue (hopefully that won't be a problem)
### Additional info
- I can reproduce this issue, as it now happens whenever I play Terraria, and usually doesn't take long to be triggered. However, I have previously been able to play the game without issues, ~~so a recent kernel update may be the cause~~.
- I can also reproduce this with kernel version `6.7.0-arch3-1`https://gitlab.freedesktop.org/drm/intel/-/issues/10484GPU HANG: ecode 12:1:859ffffb on Video playback2024-03-26T15:09:07ZAlexeyanGPU HANG: ecode 12:1:859ffffb on Video playbackOn my Dell Latitude 5560, running Arch Linux with Sway, the intel GPU crashes and freezes my display when playing back a video with hardware decoding in mpv.
Crashdump and dmesg attached
[gpu_crashdump.txt](/uploads/0af2c219ddf16fd543...On my Dell Latitude 5560, running Arch Linux with Sway, the intel GPU crashes and freezes my display when playing back a video with hardware decoding in mpv.
Crashdump and dmesg attached
[gpu_crashdump.txt](/uploads/0af2c219ddf16fd543e3d1da5f3812c6/gpu_crashdump.txt)
[gpu_crash_log.txt](/uploads/8bef2969616d2813fbab8142ba87bb4e/gpu_crash_log.txt)https://gitlab.freedesktop.org/drm/intel/-/issues/10395GPU Hang; Asynchronous wait on fence; Pageflip timed out2024-03-27T00:51:29ZjonwiltsGPU Hang; Asynchronous wait on fence; Pageflip timed outApologies, not sure if this is a duplicate or not. Running Arch Linux, 6.7.8 Zen kernel, and KDE 6.0.1 on Wayland.
Entire laptop froze for about 30 seconds (unable to move pointer with trackpad; keyboard entirely unresponsive; display f...Apologies, not sure if this is a duplicate or not. Running Arch Linux, 6.7.8 Zen kernel, and KDE 6.0.1 on Wayland.
Entire laptop froze for about 30 seconds (unable to move pointer with trackpad; keyboard entirely unresponsive; display frozen). When it was back up I did a journalctl --since "2 min ago" and saw the following. I've attached the GPU crash dump [error](/uploads/d295741c35210b4aa8341b25fac7e95d/error). Please let me know if you need anything else.
```
Mar 07 16:53:21 air kwin_wayland[3476]: kwin_core: Could not find window with uuid "{df5ab8c4-1134-4964-bd3e-13ca8034c193}"
Mar 07 16:53:31 air kwin_wayland[3476]: kwin_wayland_drm: Pageflip timed out! This is a kernel bug
Mar 07 16:53:32 air kernel: Asynchronous wait on fence 0000:00:02.0:kwin_wayland[3476]:c475e timed out (hint:intel_atomic_commit_ready [i915])
Mar 07 16:53:36 air kernel: i915 0000:00:02.0: [drm] GPU HANG: ecode 8:0:00000000
Mar 07 16:53:36 air kernel: GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
Mar 07 16:53:36 air kernel: Please file a _new_ bug report at https://gitlab.freedesktop.org/drm/intel/issues/new.
Mar 07 16:53:36 air kernel: Please see https://gitlab.freedesktop.org/drm/intel/-/wikis/How-to-file-i915-bugs for details.
Mar 07 16:53:36 air kernel: drm/i915 developers can then reassign to the right component if it's not a kernel issue.
Mar 07 16:53:36 air kernel: The GPU crash dump is required to analyze GPU hangs, so please always attach it.
Mar 07 16:53:36 air kernel: GPU crash dump saved to /sys/class/drm/card1/error
Mar 07 16:53:36 air kernel: i915 0000:00:02.0: [drm] Resetting rcs0 for stopped heartbeat on rcs0
```
System Details:
`Linux air 6.7.8-zen1-1-zen #1 ZEN SMP PREEMPT_DYNAMIC Sun, 03 Mar 2024 00:30:23 +0000 x86_64 GNU/Linux`
GPU Details
```
% sudo lspci -vnn -d ":*:0300"
00:02.0 VGA compatible controller [0300]: Intel Corporation HD Graphics 6000 [8086:1626] (rev 09) (prog-if 00 [VGA controller])
Subsystem: Apple Inc. HD Graphics 6000 [106b:011b]
Flags: bus master, fast devsel, latency 0, IRQ 54
Memory at c0000000 (64-bit, non-prefetchable) [size=16M]
Memory at b0000000 (64-bit, prefetchable) [size=256M]
I/O ports at 3000 [size=64]
Expansion ROM at 000c0000 [virtual] [disabled] [size=128K]
Capabilities: [90] MSI: Enable+ Count=1/1 Maskable- 64bit-
Capabilities: [d0] Power Management version 2
Capabilities: [a4] PCI Advanced Features
Kernel driver in use: i915
Kernel modules: i915
```https://gitlab.freedesktop.org/drm/intel/-/issues/10365Kernel 6.7 hang in gnome-shell, kernel NULL pointer dereference2024-03-16T18:00:48ZJonas OttoKernel 6.7 hang in gnome-shell, kernel NULL pointer dereferenceMy system seemed to hang and i found this to be the last information in the kernel logs, so i'm filing this bug as suggested. The hang occured while watching a youtube video in google chrome, although i'm certain this wasn't using gpu ac...My system seemed to hang and i found this to be the last information in the kernel logs, so i'm filing this bug as suggested. The hang occured while watching a youtube video in google chrome, although i'm certain this wasn't using gpu accelerated video decoding.
This has occurred multiple times today, when i was still on kernel 6.6 which i then updated to 6.7 to perhaps alleviate the issue.
I was not yet able to obtain the crash dump, as the system was not usable after the error occurred, but i will try SSH-ing in when it happens the next time.
```
Mär 03 17:57:09.187207 risa kernel: i915 0000:00:02.0: [drm] Resetting rcs0 for preemption time out
Mär 03 17:57:09.189001 risa kernel: i915 0000:00:02.0: [drm] *ERROR* GT0: rcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
Mär 03 17:57:09.198683 risa kernel: i915 0000:00:02.0: [drm] GPU HANG: ecode 9:1:85dffffa, in gnome-shell [1310]
Mär 03 17:57:09.198996 risa kernel: GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
Mär 03 17:57:09.199024 risa kernel: Please file a _new_ bug report at https://gitlab.freedesktop.org/drm/intel/issues/new.
Mär 03 17:57:09.199047 risa kernel: Please see https://gitlab.freedesktop.org/drm/intel/-/wikis/How-to-file-i915-bugs for details.
Mär 03 17:57:09.199070 risa kernel: drm/i915 developers can then reassign to the right component if it's not a kernel issue.
Mär 03 17:57:09.199092 risa kernel: The GPU crash dump is required to analyze GPU hangs, so please always attach it.
Mär 03 17:57:09.199121 risa kernel: GPU crash dump saved to /sys/class/drm/card1/error
Mär 03 17:57:18.655309 risa kernel: Asynchronous wait on fence 0000:00:02.0:gnome-shell[1310]:137cc timed out (hint:intel_atomic_commit_ready [i915])
Mär 03 17:57:23.565309 risa kernel: BUG: kernel NULL pointer dereference, address: 0000000000000270
```
```
uname -a
Linux risa 6.7.4-2-MANJARO #1 SMP PREEMPT_DYNAMIC Sat Feb 10 09:41:20 UTC 2024 x86_64 GNU/Linux
```
```
lspci -vnn -d ":*:0300"
00:02.0 VGA compatible controller [0300]: Intel Corporation HD Graphics 620 [8086:5916] (rev 02) (prog-if 00 [VGA controller])
Subsystem: Fujitsu Limited. HD Graphics 620 [10cf:1959]
Flags: bus master, fast devsel, latency 0, IRQ 132
Memory at c0000000 (64-bit, non-prefetchable) [size=16M]
Memory at b0000000 (64-bit, prefetchable) [size=256M]
I/O ports at 3000 [size=64]
Expansion ROM at 000c0000 [virtual] [disabled] [size=128K]
Capabilities: [40] Vendor Specific Information: Len=0c <?>
Capabilities: [70] Express Root Complex Integrated Endpoint, MSI 00
Capabilities: [ac] MSI: Enable+ Count=1/1 Maskable- 64bit-
Capabilities: [d0] Power Management version 2
Capabilities: [100] Process Address Space ID (PASID)
Capabilities: [200] Address Translation Service (ATS)
Capabilities: [300] Page Request Interface (PRI)
Kernel driver in use: i915
Kernel modules: i915
```https://gitlab.freedesktop.org/drm/intel/-/issues/10362GPU Hang Blender4.0 when Render in 4k with Eevee2024-03-04T03:18:15ZachimfraseGPU Hang Blender4.0 when Render in 4k with EeveeHi,
when I try to render the following blender file in 4k with Eevee I get an GPU Hang.
It must have to do something with 4k because 2k (2048x2048 px) is working.
https://gitlab.gnome.org/Teams/Design/wallpaper-assets/-/blob/master/46/...Hi,
when I try to render the following blender file in 4k with Eevee I get an GPU Hang.
It must have to do something with 4k because 2k (2048x2048 px) is working.
https://gitlab.gnome.org/Teams/Design/wallpaper-assets/-/blob/master/46/experiments/geometric4.blend?ref_type=heads
uname -a
Linux 6.7.6-200.fc39.x86_64 #1 SMP PREEMPT_DYNAMIC Fri Feb 23 18:27:29 UTC 2024 x86_64 GNU/Linux
[error.bz2](/uploads/2eb52e06e39dbfb7ac70c45ab0bb0f43/error.bz2)
[lspci.txt](/uploads/dd01d8c570341e3d67b0801652b8040b/lspci.txt)
I guess this should be easy to reproduce.
Let me know if you need further information.
Regards, Achimhttps://gitlab.freedesktop.org/drm/intel/-/issues/10279GT0: GUC: Engine reset failed on 0:0 (rcs0) because 0x000000002024-03-27T16:26:03ZLeah NeukirchenGT0: GUC: Engine reset failed on 0:0 (rcs0) because 0x00000000<!--
Please read this first: https://drm.pages.freedesktop.org/intel-docs/how-to-file-i915-bugs.html
Please do not use the confidential checkbox below.
-->
This GPU crash happened after ~42h of uptime on an Thinkpad T14 Intel Gen 4. B...<!--
Please read this first: https://drm.pages.freedesktop.org/intel-docs/how-to-file-i915-bugs.html
Please do not use the confidential checkbox below.
-->
This GPU crash happened after ~42h of uptime on an Thinkpad T14 Intel Gen 4. Backstory: The GPU hung up previously multiple times with "GuC firmware i915/adlp_guc_70.bin version 70.13.1" (between 1 and 3 days of running), but the GPU didn't crash and was just stuck (blocked task in intel_pipe_update_end); the rest of the system kept runnining fine. I downgraded to "GuC firmware i915/adlp_guc_70.bin version 70.5.1" and the system ran fine for over 60 days (then the GPU locked up and the whole system hanged).
I now rebooted with 70.13.1 and could finally gather a crash dump on Kernel 6.7.4.
I did not do anything special, xscreensaver was running a GL demo, so OpenGL was actively used.
The machine is a ThinkPad T14 Gen 4 PF4NDRMJ (Intel) running a stock Void Linux x86_64 kernel (equivalent to vanilla kernel.org):
% uname -a
Linux hera 6.7.4_1 #1 SMP PREEMPT_DYNAMIC Wed Feb 7 19:24:35 UTC 2024 x86_64 GNU/Linux
% lspci -vnn -d :*:0300
00:02.0 VGA compatible controller [0300]: Intel Corporation Raptor Lake-P [Iris Xe Graphics] [8086:a7a1] (rev 04) (prog-if 00 [VGA controller])
Subsystem: Lenovo Device [17aa:230e] Flags: bus master, fast devsel, latency 0, IRQ 141, IOMMU group 0
Memory at 603c000000 (64-bit, non-prefetchable) [size=16M]
Memory at 4000000000 (64-bit, prefetchable) [size=256M]
I/O ports at 2000 [size=64]
Expansion ROM at 000c0000 [virtual] [disabled] [size=128K]
Capabilities: [40] Vendor Specific Information: Len=0c <?>
Capabilities: [70] Express Root Complex Integrated Endpoint, MSI 00
Capabilities: [ac] MSI: Enable+ Count=1/1 Maskable+ 64bit-
Capabilities: [d0] Power Management version 2
Capabilities: [100] Process Address Space ID (PASID)
Capabilities: [200] Address Translation Service (ATS)
Capabilities: [300] Page Request Interface (PRI)
Capabilities: [320] Single Root I/O Virtualization (SR-IOV)
Kernel driver in use: i915
Kernel modules: i915
dmesg of crash part:
```
[151055.398471] i915 0000:00:02.0: [drm] *ERROR* GT0: GUC: Engine reset failed on 0:0 (rcs0) because 0x00000000
[151055.435102] i915 0000:00:02.0: [drm] GPU HANG: ecode 12:1:84dffffb, in gibson:gdrv0 [18384]
[151055.435106] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[151055.435112] Please file a _new_ bug report at https://gitlab.freedesktop.org/drm/intel/issues/new.
[151055.435112] Please see https://gitlab.freedesktop.org/drm/intel/-/wikis/How-to-file-i915-bugs for details.
[151055.435112] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[151055.435113] The GPU crash dump is required to analyze GPU hangs, so please always attach it.
[151055.435113] GPU crash dump saved to /sys/class/drm/card0/error
[151055.435248] i915 0000:00:02.0: [drm] GT0: Resetting chip for GuC failed to reset engine mask=0x1
[151055.537453] i915 0000:00:02.0: [drm] *ERROR* GT0: rcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
[151055.538165] i915 0000:00:02.0: [drm] *ERROR* GT0: rcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
[151055.538306] i915 0000:00:02.0: [drm] gibson:gdrv0[18384] context reset due to GPU hang
[151055.538366] i915 0000:00:02.0: [drm] GT0: GuC firmware i915/adlp_guc_70.bin version 70.13.1
[151055.538369] i915 0000:00:02.0: [drm] GT0: HuC firmware i915/tgl_huc.bin version 7.9.3
[151055.556102] i915 0000:00:02.0: [drm] GT0: HuC: authenticated for all workloads
[151055.556521] i915 0000:00:02.0: [drm] GT0: GUC: submission enabled
[151055.556522] i915 0000:00:02.0: [drm] GT0: GUC: SLPC enabled
[151069.225354] i915 0000:00:02.0: [drm] *ERROR* GT0: GUC: Engine reset failed on 0:0 (rcs0) because 0x00000000
[151069.267220] i915 0000:00:02.0: [drm] GPU HANG: ecode 12:1:84dffffb, in gibson:gdrv0 [18384]
[151069.267304] i915 0000:00:02.0: [drm] GT0: Resetting chip for GuC failed to reset engine mask=0x1
[151069.370417] i915 0000:00:02.0: [drm] *ERROR* GT0: rcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
[151069.371161] i915 0000:00:02.0: [drm] *ERROR* GT0: rcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
[151069.371317] i915 0000:00:02.0: [drm] gibson:gdrv0[18384] context reset due to GPU hang
[151069.371417] i915 0000:00:02.0: [drm] GT0: GuC firmware i915/adlp_guc_70.bin version 70.13.1
[151069.371425] i915 0000:00:02.0: [drm] GT0: HuC firmware i915/tgl_huc.bin version 7.9.3
[151069.389993] i915 0000:00:02.0: [drm] GT0: HuC: authenticated for all workloads
[151069.391004] i915 0000:00:02.0: [drm] GT0: GUC: submission enabled
[151069.391016] i915 0000:00:02.0: [drm] GT0: GUC: SLPC enabled
[151078.588765] i915 0000:00:02.0: [drm] *ERROR* GT0: GUC: Engine reset failed on 0:0 (rcs0) because 0x00000000
[151078.625076] i915 0000:00:02.0: [drm] GPU HANG: ecode 12:1:84dffffb, in gibson:gdrv0 [18384]
[151078.625335] i915 0000:00:02.0: [drm] GT0: Resetting chip for GuC failed to reset engine mask=0x1
[151078.727973] i915 0000:00:02.0: [drm] *ERROR* GT0: rcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
[151078.728701] i915 0000:00:02.0: [drm] *ERROR* GT0: rcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
[151078.728893] i915 0000:00:02.0: [drm] gibson:gdrv0[18384] context reset due to GPU hang
[151078.729013] i915 0000:00:02.0: [drm] GT0: GuC firmware i915/adlp_guc_70.bin version 70.13.1
[151078.729026] i915 0000:00:02.0: [drm] GT0: HuC firmware i915/tgl_huc.bin version 7.9.3
[151078.747763] i915 0000:00:02.0: [drm] GT0: HuC: authenticated for all workloads
[151078.748189] i915 0000:00:02.0: [drm] GT0: GUC: submission enabled
[151078.748193] i915 0000:00:02.0: [drm] GT0: GUC: SLPC enabled
```
full dmesg is attached: [dmesg.gz](/uploads/eaa1bf36bd3bbc0345e017067e2cd491/dmesg.gz)
GPU error log is attached: [error.gz](/uploads/10478e96846b5ef96cc9f0c87ab9b241/error.gz)https://gitlab.freedesktop.org/drm/intel/-/issues/10106GPU HANG: ecode 12:1:84dffffb when playing Terraria after a couple of hours2024-01-23T16:37:20ZDebianProgrammerGPU HANG: ecode 12:1:84dffffb when playing Terraria after a couple of hoursAfter playing Terraria for a couple of hours, I get a GPU HANG: ecode 12:1:84dffffb. This seems to happen at random, however if playing for a couple of hours I do get it. When it happens, everything (even the mouse) freezes for a couple ...After playing Terraria for a couple of hours, I get a GPU HANG: ecode 12:1:84dffffb. This seems to happen at random, however if playing for a couple of hours I do get it. When it happens, everything (even the mouse) freezes for a couple of seconds, and then resumes, however it can happen again shortly unless I restart the game, and sometimes it blanks out the game but everything else is unfrozen. There have been times where the system would freeze and not unfreeze, requiring a hard reset.
Model: Framework Laptop 13, 12th Gen intel, (CPU: i5-1240P, iGPU: Intel Alder Lake-P GT2 [Iris Xe Graphics])
WM: Sway, Wayland. However I have tried KDE/Wayland and got the same issue.
OS: Arch Linux
I also have Intel Turbo Boost disabled (`/sys/devices/system/cpu/intel_pstate/no_turbo` is set to `1`), as sometimes the CPU gets too hot when turbo boosting on other games.
`uname -srvmo`: `Linux 6.7.0-arch3-1 #1 SMP PREEMPT_DYNAMIC Sat, 13 Jan 2024 14:37:14 +0000 x86_64 GNU/Linux`
Full dmesg is attached, scroll down for the GPU HANG on when it happened. `/sys/class/drm/card1/error` is in `drmerror.txt`, and `sudo lspci -vnn -d ":*:0300"` is in `pciinfo.txt`
[drmerror.txt](/uploads/70725d3ebbc4137a497e03bf2518054b/drmerror.txt)
[dmesg_send.txt](/uploads/6fb1697d36cce0822376c6431dded995/dmesg_send.txt)
[pciinfo.txt](/uploads/56e5a37f5f6a4314dc1410675fa06659/pciinfo.txt)https://gitlab.freedesktop.org/drm/intel/-/issues/10091Broadwell: Asynchronous wait on fence 0000:00:02.0:Xorg[707]:4c90 timed out (...2024-01-15T17:30:34ZPaw LickerBroadwell: Asynchronous wait on fence 0000:00:02.0:Xorg[707]:4c90 timed out (hint:intel_atomic_commit_ready [i915]), GPU HANG ecode 8:0:00000000Rewriting the issue documented in #8687 to make it clearer: On laptops with the Broadwell IGP, the video chipset is subject to random freezing for around 10 or so seconds followed by a quick recovery, logging a dmesg error and GPU_HANG. ...Rewriting the issue documented in #8687 to make it clearer: On laptops with the Broadwell IGP, the video chipset is subject to random freezing for around 10 or so seconds followed by a quick recovery, logging a dmesg error and GPU_HANG. In some cases the entire video output on the CF-RZ4 would freeze up. I have confirmed this issue in a Lenovo Thinkpad Yoga 12 (i5-5200u) and Panasonic Let's Note CF-RZ4 (m5-5y70) and both kernels 6.1 and 6.6. Attached are files from both the Yoga 12 and CF-RZ4.
```
[ 168.077810] Asynchronous wait on fence 0000:00:02.0:Xorg[707]:4c90 timed out (hint:intel_atomic_commit_ready [i915])
[ 172.174339] i915 0000:00:02.0: [drm] GPU HANG: ecode 8:0:00000000
[ 172.175347] i915 0000:00:02.0: [drm] Resetting rcs0 for stopped heartbeat on rcs0
```
To reproduce this error; [download GPUTest 0.4.0](https://www.geeks3d.com/gputest/download/) and run the furmark windowed over and over. Depending on the hardware, it might take more attempts to hang the GPU with a black window before it recovers and runs furmark. One of the users in #8687 made a script to automate this, I'm attaching it as an sh because the paste function of gitlab is kind of buggy.
[gpuhangtest.sh](/uploads/cbeb9db77e635c8cd982cb62a82b2bc5/gpuhangtest.sh)
[xrandr](/uploads/404e96ac8b9ff6a1936e42d6081a5eb8/xrandr)
[lspcivnn](/uploads/353796a6d9b2ba9d5cc62416eceb66eb/lspcivnn)
[dmidecode](/uploads/0dc9356f4dabb6f780f212f6bb17e783/dmidecode)
[gpuerrordmesg](/uploads/da44c49d6aea6910a43f0f0e43358293/gpuerrordmesg)
[gpuerroruname](/uploads/907c5b80062d0d5a73dd4cc989054e9f/gpuerroruname)
[gpuerror](/uploads/bc71aa799ed3d436abb8e3e36e93c7a3/gpuerror)
[dmesgerryoga](/uploads/689a8728607458f4406acdbd7436d4e2/dmesgerryoga)
[gpuerroryoga](/uploads/87a4f9f486ffb3e1e2e6a88818dca78b/gpuerroryoga)
[xrandryoga](/uploads/0d01f6444a03a15849a205de39d58d4d/xrandryoga)
[yogadmidecode](/uploads/af2b2a452494da2f0563d4d73860da00/yogadmidecode)
[yogalspci](/uploads/e8c2af3c7ce7971f9be5614a550fcad2/yogalspci)
[yogauname](/uploads/749a649c453c7c74a2f4a87eb3d0b773/yogauname)https://gitlab.freedesktop.org/drm/intel/-/issues/9777Blank screen or vertical stripes instead of login screen on boot2024-01-16T15:19:10Zs0600204Blank screen or vertical stripes instead of login screen on bootI recently updated my computer from Debian 11 ("Bullseye") to 12 ("Bookworm"). This included an change of kernel version. Ever since, when booting normally (e.g. without altering the boot commands in Grub) into the "new" kernel I either ...I recently updated my computer from Debian 11 ("Bullseye") to 12 ("Bookworm"). This included an change of kernel version. Ever since, when booting normally (e.g. without altering the boot commands in Grub) into the "new" kernel I either get [vertical striping](/uploads/ba1276329bdeb03d7532f76e68b3131a/preview_photo.jpg) or an empty greenish or pinkish off-white screen. Also, the keyboard becomes unresponsive. (Fortunately pressing the power button on the front of the machine is still interpreted as a request for a controlled shutdown.)
This happens should I connect the screen via HDMI or VGA. I lack DVI and DisplayPort cables, so am unable to test those outputs at this time.
Known working kernel:
* 5.10.0-26-rt-amd64 (the latest kernel available from Bullseye's repos, left over from the upgrade and not uninstalled yet)
- `uname -a` : `Linux delta 5.10.0-26-rt-amd64 #1 SMP PREEMPT_RT Debian 5.10.197-1 (2023-09-29) x86_64 GNU/Linux`
Known not-working kernels:
* 6.1.0-10-rt-amd64 (the oldest kernel available from Bookworm's repos)
* 6.1.0-13-rt-amd64 (the latest kernel available from Bookworm's repos)
- `uname -a` : `Linux delta 6.1.0-13-rt-amd64 #1 SMP PREEMPT_RT Debian 6.1.55-1 (2023-09-29) x86_64 GNU/Linux`
* 6.5.0-0.deb12.1-rt-amd64 (the latest kernel available from Bookworm-backport's repos)
- `uname -a` : `Linux delta 6.5.0-0.deb12.1-rt-amd64 #1 SMP PREEMPT_RT Debian 6.5.3-1~bpo12+1 (2023-10-08) x86_64 GNU/Linux`
Output of journalctl with dmesg included (both with monitor plugged into HDMI output):
* [journalctl (kernel 6.1.0)](/uploads/83513432747c109cff6d6e0663a874ba/log-journalctl-6.1.0)
* [journalctl (kernel 6.5.0)](/uploads/5ca1529d193091b6fc6327846f10aa1e/log-journalctl-6.5)
Output of [xrandr --verbose](/uploads/324d601750322489aa3108f023a42574/log-xrandr-6.1.0)
Output of [lspci -vnn -d :*:0300](/uploads/4d848460a9634d43709575c4c48bf8d7/log-lspci-6.1.0)
Output of [dmidecode](/uploads/e6e8bb3c1c77b5f00b9f4c2fd3e59c91/log-dmidecode-6.1.0)
It is possible to get round the problem - for now - by booting the system and passing either `single` or `nomodeset` to the boot line in Grub. (It's how I was able to get the files above.) Or, alternatively, by booting into the older 5.10.0 kernel.https://gitlab.freedesktop.org/drm/intel/-/issues/9396GPU HANG: ecode 12:0:00000000 on Arc A770 FE 16GB2023-10-24T21:13:47ZChristopher SnowhillGPU HANG: ecode 12:0:00000000 on Arc A770 FE 16GBI managed to hang my GPU in the middle of a video call, and it caused the compositor to freeze and terminate most applications. It seems to be somewhat random, as nothing I know of can reproduce it consistently. I neglected to grab the G...I managed to hang my GPU in the middle of a video call, and it caused the compositor to freeze and terminate most applications. It seems to be somewhat random, as nothing I know of can reproduce it consistently. I neglected to grab the GPU error dump, and only collected dmesg output from the session. I will try to collect an error dump again another time, if I can reproduce the error.
This seems similar to #8556 except on DG2 instead of ADL.
* System architecture: x86_64
* Kernel version: 6.5.4-1-cachyos
* Linux distribution: Arch Linux
* Machine or mother board model: MSI B450 Tomahawk
* Display connector: 2x DP
* A full dmesg with debug information and/or a GPU crash dump:
* I only have a partial dmesg, since I didn't think I needed debug information when I was running what should have been a stable system. Here's the normal log: [dmesg.12.txt](/uploads/ec580bc95348c851f000057f2563e42e/dmesg.12.txt)
* The dmesg made no mention of a crash dump being exposed anywhere, either.
Excerpts:
```
[59801.916277] perf: interrupt took too long (3271 > 3252), lowering kernel.perf_event_max_sample_rate to 61000
[60811.428283] i915 0000:28:00.0: [drm] *ERROR* GT0: GUC: Engine reset failed on 3:0 (bcs0) because 0x00000000
[60812.028962] i915 0000:28:00.0: [drm] *ERROR* render: timed out waiting for forcewake ack request.
[60812.028970] i915 0000:28:00.0: [drm:add_taint_for_CI [i915]] CI tainted:0x9 by fw_domains_get_with_fallback+0x1f9/0x250 [i915]
[60812.629882] i915 0000:28:00.0: [drm] *ERROR* render: timed out waiting for forcewake ack request.
[60812.629891] i915 0000:28:00.0: [drm:add_taint_for_CI [i915]] CI tainted:0x9 by fw_domains_get_with_fallback+0x1f9/0x250 [i915]
[60813.230897] i915 0000:28:00.0: [drm] *ERROR* vdbox0: timed out waiting for forcewake ack request.
[60813.230905] i915 0000:28:00.0: [drm:add_taint_for_CI [i915]] CI tainted:0x9 by fw_domains_get_with_fallback+0x1f9/0x250 [i915]
[60813.831885] i915 0000:28:00.0: [drm] *ERROR* vdbox2: timed out waiting for forcewake ack request.
[60813.831893] i915 0000:28:00.0: [drm:add_taint_for_CI [i915]] CI tainted:0x9 by fw_domains_get_with_fallback+0x1f9/0x250 [i915]
[60814.452900] i915 0000:28:00.0: [drm] *ERROR* render: timed out waiting for forcewake ack request.
[60814.452908] i915 0000:28:00.0: [drm:add_taint_for_CI [i915]] CI tainted:0x9 by fw_domains_get_with_fallback+0x1f9/0x250 [i915]
[60814.646502] Asynchronous wait on fence 0000:28:00.0:wayfire[6240]:163766 timed out (hint:intel_atomic_commit_ready [i915])
[60814.646675] Asynchronous wait on fence 0000:28:00.0:wayfire[6240]:163764 timed out (hint:intel_atomic_commit_ready [i915])
```
```
[60819.862895] i915 0000:28:00.0: [drm] GPU HANG: ecode 12:0:00000000
[60819.863271] i915 0000:28:00.0: [drm] Resetting chip for GuC failed to reset engine mask=0x2
[60819.865472] i915 0000:28:00.0: [drm] *ERROR* GT0: Failed to reset GuC, ret = -110
[60820.466130] i915 0000:28:00.0: [drm] *ERROR* render: timed out waiting for forcewake ack request.
[60820.466138] i915 0000:28:00.0: [drm:add_taint_for_CI [i915]] CI tainted:0x9 by fw_domains_get_with_fallback+0x1f9/0x250 [i915]
[60821.067005] i915 0000:28:00.0: [drm] *ERROR* vdbox0: timed out waiting for forcewake ack request.
[60821.067013] i915 0000:28:00.0: [drm:add_taint_for_CI [i915]] CI tainted:0x9 by fw_domains_get_with_fallback+0x1f9/0x250 [i915]
[60821.667877] i915 0000:28:00.0: [drm] *ERROR* vdbox2: timed out waiting for forcewake ack request.
[60821.667884] i915 0000:28:00.0: [drm:add_taint_for_CI [i915]] CI tainted:0x9 by fw_domains_get_with_fallback+0x1f9/0x250 [i915]
[60822.268753] i915 0000:28:00.0: [drm] *ERROR* vebox0: timed out waiting for forcewake ack request.
[60822.268760] i915 0000:28:00.0: [drm:add_taint_for_CI [i915]] CI tainted:0x9 by fw_domains_get_with_fallback+0x1f9/0x250 [i915]
[60822.869625] i915 0000:28:00.0: [drm] *ERROR* vebox1: timed out waiting for forcewake ack request.
[60822.869633] i915 0000:28:00.0: [drm:add_taint_for_CI [i915]] CI tainted:0x9 by fw_domains_get_with_fallback+0x1f9/0x250 [i915]
[60822.871000] i915 0000:28:00.0: [drm] GPU HANG: ecode 12:0:00000000
```
I'll try to reproduce it, but I don't have any idea what caused it, other than a potential overheat, and the temperatures were only hot to the touch when I went to swap out the GPU, not something I observed with sensors logging.
The last thing I was doing before the problem last time was playing A Hat in Time, then I stopped for a while. I also had dust clogging the intake vents on my case, which I have since cleaned.https://gitlab.freedesktop.org/drm/intel/-/issues/9193[drm] GPU HANG: ecode 12:1:85dffffb, in telegram-deskto [218965]2024-02-22T03:37:01ZJean-Louis Dupond[drm] GPU HANG: ecode 12:1:85dffffb, in telegram-deskto [218965]When opening media (for ex a photo) in Telegram Desktop, I always have GPU hangs and whole Gnome freezes.
This happens only when I open the media on a (fractional) scaled display. Not on an external display which is not scaled.
Kernel l...When opening media (for ex a photo) in Telegram Desktop, I always have GPU hangs and whole Gnome freezes.
This happens only when I open the media on a (fractional) scaled display. Not on an external display which is not scaled.
Kernel logs the following:
```
aug 23 11:34:31 lt-jeanlouis kernel: i915 0000:00:02.0: [drm] GPU HANG: ecode 12:1:85dffffb, in telegram-deskto [218965]
aug 23 11:34:31 lt-jeanlouis kernel: i915 0000:00:02.0: [drm] telegram-deskto[218965] context reset due to GPU hang
aug 23 11:34:39 lt-jeanlouis kernel: i915 0000:00:02.0: [drm] GPU HANG: ecode 12:1:85dffffb, in telegram-deskto [218965]
aug 23 11:34:39 lt-jeanlouis kernel: i915 0000:00:02.0: [drm] telegram-deskto[218965] context reset due to GPU hang
aug 23 11:34:46 lt-jeanlouis kernel: i915 0000:00:02.0: [drm] GPU HANG: ecode 12:1:85dffffb, in telegram-deskto [218965]
aug 23 11:34:46 lt-jeanlouis kernel: i915 0000:00:02.0: [drm] telegram-deskto[218965] context reset due to GPU hang
```
My system is running Arch with Gnome Shell 44.3.
Kernel: 6.4.11-arch2-1
Dell XPS 15 9530
VGA:
00:02.0 VGA compatible controller: Intel Corporation Raptor Lake-P [Iris Xe Graphics] (rev 04)
My system also has a additional VGA:
01:00.0 3D controller: NVIDIA Corporation AD107M [GeForce RTX 4050 Max-Q / Mobile] (rev a1)
Now if I run telegram-desktop with prime-run, the issue does NOT occur anymore.
So it seems like something in Telegram does trigger some bug in the Intel VGA driver code.
There is a bugreport on Telegram also: https://github.com/telegramdesktop/tdesktop/issues/26393
Let me know if there is anything I can do to help debugging this :smile:
But as it can be reproduced quite easily, it should be easy to debug.https://gitlab.freedesktop.org/drm/intel/-/issues/9150GPU Hang Error with OpenVINO's "object_detection_demo.py" on Kernel 6.2 and 6...2023-08-20T03:16:55ZCarlosMLGPU Hang Error with OpenVINO's "object_detection_demo.py" on Kernel 6.2 and 6.5.0 using i915 Driver**Steps to Reproduce:**
1. Set up an environment with Ubuntu 22.04.
2. Install the kernel version 6.2 HWE or 6.5.0-060500rc2drmintelnext20230817-generic. Note that with kernel 5.15, the problem does not manifest.
3. Use the official Ope...**Steps to Reproduce:**
1. Set up an environment with Ubuntu 22.04.
2. Install the kernel version 6.2 HWE or 6.5.0-060500rc2drmintelnext20230817-generic. Note that with kernel 5.15, the problem does not manifest.
3. Use the official OpenVINO 2023.0.1 Docker image.
4. Run the demo "object_detection_demo.py" from Open Model Zoo:
```
object_detection_demo.py -m yolox-tiny/FP16/yolox-tiny.xml -at yolox -i test.jpg --no_show -d GPU
```
5. Observe the GPU hang error.
---
**Frequency of Issue:**
The error consistently occurs every time the demo is executed using the specified kernel versions.
---
**Additional Information:**
Error log:
```
[ 80.761763] i915 0000:00:02.0: [drm:i915_gem_open [i915]]
[ 92.302537] i915 0000:00:02.0: [drm] Resetting rcs0 for preemption time out
[ 92.302702] i915 0000:00:02.0: [drm] python3[2103] context reset due to GPU hang
[ 92.302745] i915 0000:00:02.0: [drm:mark_guilty [i915]] context python3[2103]: guilty 1, banned
[ 92.303647] i915 0000:00:02.0: [drm:mark_guilty [i915]] client python3[2103]: gained 4 ban score, now 4
[ 92.314159] i915 0000:00:02.0: [drm] GPU HANG: ecode 9:1:e757fefe, in python3 [2103]
```
**System information:**[error.bz2](/uploads/0a8ae6a8b7b13e4ae49e386f379e0d11/error.bz2)
System architecture: x86_64
Kernel version: 6.5.0-060500rc2drmintelnext20230817-generic
Linux distribution: Ubuntu Server 22.04
DMI: Intel(R) Client Systems NUC7CJYHN/NUC7JYB
**Attempted Solution:**
Tried to extend the timeout by using:
```
echo 10000 | sudo tee /sys/class/drm/card0/engine/rcs0/preempt_timeout_ms
```
However, this did not resolve the error.
---[error.bz2](/uploads/037a025d0204f8118114e22b020f5220/error.bz2)https://gitlab.freedesktop.org/drm/intel/-/issues/9130Regression: 100% reproducible GPU hangs in GfxBench Car Chase & Aztec Ruins b...2023-10-03T10:59:11ZEero TamminenRegression: 100% reproducible GPU hangs in GfxBench Car Chase & Aztec Ruins benchmarksBetween following drm-tip versions:
* 8f73cd99e6: 2023y-07m-25d-16h-32m-02s UTC integration manifest
* 50f130ab30: 2023y-07m-26d-14h-37m-59s UTC integration manifest
Kernel started to GPU hang 3x times during each GfxBench Car Chase and...Between following drm-tip versions:
* 8f73cd99e6: 2023y-07m-25d-16h-32m-02s UTC integration manifest
* 50f130ab30: 2023y-07m-26d-14h-37m-59s UTC integration manifest
Kernel started to GPU hang 3x times during each GfxBench Car Chase and Aztec Ruins run:
```
[ 5181.934825] Iteration 1/3: bin/testfw_app --gfx glfw --gl_api desktop_core --width 1920 --height 1080 --fullscreen 1 --test_id gl_4
[ 5196.997717] i915 0000:00:02.0: [drm] Resetting rcs0 for preemption time out
[ 5196.997828] i915 0000:00:02.0: [drm] testfw_app[9811] context reset due to GPU hang
[ 5197.020016] i915 0000:00:02.0: [drm] GPU HANG: ecode 12:1:8ed9fff2, in testfw_app [9811]
[ 5210.821053] i915 0000:00:02.0: [drm] Resetting rcs0 for preemption time out
[ 5210.821177] i915 0000:00:02.0: [drm] testfw_app[9812] context reset due to GPU hang
[ 5210.842677] i915 0000:00:02.0: [drm] GPU HANG: ecode 12:1:8ed9eff2, in testfw_app [9812]
[ 5224.644442] i915 0000:00:02.0: [drm] Resetting rcs0 for preemption time out
[ 5224.644565] i915 0000:00:02.0: [drm] testfw_app[9812] context reset due to GPU hang
[ 5224.665198] i915 0000:00:02.0: [drm] GPU HANG: ecode 12:1:8ed9eff2, in testfw_app [9812]
...
[ 6060.130093] Iteration 1/3: bin/testfw_app --gfx glfw --gl_api desktop_core --width 1920 --height 1080 --fullscreen 1 --test_id gl_5_normal
[ 6075.036781] i915 0000:00:02.0: [drm] Resetting rcs0 for preemption time out
[ 6075.036913] i915 0000:00:02.0: [drm] testfw_app[6413] context reset due to GPU hang
[ 6075.120293] i915 0000:00:02.0: [drm] GPU HANG: ecode 12:1:8fd8ffff, in testfw_app [6413]
[ 6089.372194] i915 0000:00:02.0: [drm] Resetting rcs0 for preemption time out
[ 6089.372323] i915 0000:00:02.0: [drm] testfw_app[6414] context reset due to GPU hang
[ 6089.453649] i915 0000:00:02.0: [drm] GPU HANG: ecode 12:1:8fd8ffff, in testfw_app [6414]
[ 6099.611724] i915 0000:00:02.0: [drm] Resetting rcs0 for preemption time out
[ 6099.611859] i915 0000:00:02.0: [drm] testfw_app[6414] context reset due to GPU hang
[ 6099.662211] i915 0000:00:02.0: [drm] GPU HANG: ecode 12:1:8fdaffff, in testfw_app [6414]
```
Error state for drm-tip yesterday (2023-08-15) Git head: [i915_error_state.txt](/uploads/e78a25d78372c1940294ed66149a903f/i915_error_state.txt)
Other notes:
* This is 100% reproducible, there are 3x hangs on every run
* Of the 3 machines types on which I'm running this, I see this only on GEN12 TGL, not on GEN9 BXT / GLK => It could be either GEN12+ or TGL specific
* Because this does not happen on BXT / GLK which are significantly slower than TGL, this should not be an issue of shader just being very slow
* Not sure whether it's at all related, but TGL boots have also started to slow down (so that it hits automation timeouts)https://gitlab.freedesktop.org/drm/intel/-/issues/8973[ 6863.838048] [drm] GPU HANG: ecode 9:0:0xfffffffe, reason: Hang on rcs0, ac...2023-08-02T06:20:57Zsyoung[ 6863.838048] [drm] GPU HANG: ecode 9:0:0xfffffffe, reason: Hang on rcs0, action: resethow to solve the problem.
kernel: [ 6278.383059] [drm:gen9_set_dc_state [i915]] *ERROR* DC state mismatch (0x0 -> 0x2)
kernel: [ 6863.838048] [drm] GPU HANG: ecode 9:0:0xfffffffe, reason: Hang on rcs0, action: reset
kernel: [ 6863.838109...how to solve the problem.
kernel: [ 6278.383059] [drm:gen9_set_dc_state [i915]] *ERROR* DC state mismatch (0x0 -> 0x2)
kernel: [ 6863.838048] [drm] GPU HANG: ecode 9:0:0xfffffffe, reason: Hang on rcs0, action: reset
kernel: [ 6863.838109] i915 0000:00:02.0: Resetting rcs0 after gpu hang
kernel: [ 6863.838109] i915 0000:00:02.0: Resetting rcs0 after gpu hang
kernel: [ 6863.838109] i915 0000:00:02.0: Resetting rcs0 after gpu hang
the error log is [error.gz](/uploads/6106681b8ed4040786f70aca846d5166/error.gz) by using
'cat /sys/class/drm/card0/error | gzip > error.gz'
thankshttps://gitlab.freedesktop.org/drm/intel/-/issues/8778i915 0000:00:02.0: [drm] GPU HANG: ecode 12:1:84dffffb with yandex maps2023-07-05T15:14:56ZK Yi915 0000:00:02.0: [drm] GPU HANG: ecode 12:1:84dffffb with yandex mapsOn my new laptop MSI Modern 15 B12M with Intel Alder Lake-UP3 GT2 Iris Xe Graphics GPU (Intel i7-1255U CPU) and archlinux if I open https://yandex.com/maps, insert source and target location and move or scale map from few seconds to one ...On my new laptop MSI Modern 15 B12M with Intel Alder Lake-UP3 GT2 Iris Xe Graphics GPU (Intel i7-1255U CPU) and archlinux if I open https://yandex.com/maps, insert source and target location and move or scale map from few seconds to one minute I have get system freeze around 15-20 seconds (but twice system was full unavailable on the long and me help only hard reset).
In system logs I get:
```
Jul 03 18:58:15 sinx kernel: i915 0000:00:02.0: [drm] Resetting rcs0 for preemption time out
Jul 03 18:58:15 sinx kernel: i915 0000:00:02.0: [drm] *ERROR* rcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
Jul 03 18:58:15 sinx kernel: i915 0000:00:02.0: [drm] GPU HANG: ecode 12:1:84dffffb, in chrome [1958]
Jul 03 18:58:19 sinx kernel: Asynchronous wait on fence 0000:00:02.0:kwin_wayland[1041]:71a4 timed out (hint:intel_atomic_commit_ready [i915])
Jul 03 18:58:28 sinx kernel: Fence expiration time out i915-0000:00:02.0:chrome[1958]:5390!
Jul 03 18:58:28 sinx kernel: Fence expiration time out i915-0000:00:02.0:kwin_wayland[1041]:71a4!
Jul 03 18:58:28 sinx kernel: Fence expiration time out i915-0000:00:02.0:chrome[1958]:5392!
Jul 03 18:58:28 sinx kernel: Fence expiration time out i915-0000:00:02.0:chrome[1958]:5394!
Jul 03 18:58:28 sinx kernel: Fence expiration time out i915-0000:00:02.0:chrome[1958]:5396!
Jul 03 18:58:28 sinx kernel: Fence expiration time out i915-0000:00:02.0:chrome[1958]:5398!
Jul 03 18:58:28 sinx kernel: Fence expiration time out i915-0000:00:02.0:chrome[1958]:539a!
Jul 03 18:58:28 sinx kernel: Fence expiration time out i915-0000:00:02.0:chrome[1958]:539c!
Jul 03 18:58:28 sinx kernel: Fence expiration time out i915-0000:00:02.0:chrome[1958]:539e!
Jul 03 18:58:29 sinx kernel: Fence expiration time out i915-0000:00:02.0:kwin_wayland[1041]:71aa!
Jul 03 18:58:29 sinx kernel: Fence expiration time out i915-0000:00:02.0:kwin_wayland[1041]:71a8!
Jul 03 18:58:29 sinx kernel: Fence expiration time out i915-0000:00:02.0:kwin_wayland[1041]:71a6!
Jul 03 18:58:37 sinx kernel: i915 0000:00:02.0: [drm] GPU HANG: ecode 12:1:84dffffb, in chrome [1958]
Jul 03 18:58:37 sinx kernel: i915 0000:00:02.0: [drm] Resetting rcs0 for stopped heartbeat on rcs0
Jul 03 18:58:37 sinx kernel: i915 0000:00:02.0: [drm] *ERROR* rcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
Jul 03 18:58:37 sinx kernel: i915 0000:00:02.0: [drm] Resetting chip for stopped heartbeat on rcs0
Jul 03 18:58:37 sinx kernel: i915 0000:00:02.0: [drm] *ERROR* rcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
Jul 03 18:58:37 sinx kernel: i915 0000:00:02.0: [drm] *ERROR* rcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
Jul 03 18:58:37 sinx kernel: i915 0000:00:02.0: [drm] chrome[1958] context reset due to GPU hang
Jul 03 18:58:37 sinx kernel: i915 0000:00:02.0: [drm] GT0: GuC firmware i915/adlp_guc_70.bin version 70.5.1
Jul 03 18:58:37 sinx kernel: i915 0000:00:02.0: [drm] GT0: HuC firmware i915/tgl_huc.bin version 7.9.3
Jul 03 18:58:37 sinx kernel: i915 0000:00:02.0: [drm] GT0: HuC: authenticated!
Jul 03 18:58:37 sinx kernel: i915 0000:00:02.0: [drm] GT0: GUC: submission disabled
Jul 03 18:58:37 sinx kernel: i915 0000:00:02.0: [drm] GT0: GUC: SLPC disabled
Jul 03 18:58:37 sinx plasmashell[1907]: [59:59:0703/185837.466674:ERROR:shared_context_state.cc(898)] SharedContextState context lost via ARB/EXT_robustness. Reset status = GL_GUILTY_CONTE>
Jul 03 18:58:37 sinx plasmashell[1907]: [59:59:0703/185837.466900:ERROR:gpu_service_impl.cc(1010)] Exiting GPU process because some drivers can't recover from errors. GPU process will rest>
Jul 03 18:58:37 sinx plasmashell[1907]: [4:4:0703/185837.483934:ERROR:gpu_process_host.cc(953)] GPU process exited unexpectedly: exit_code=8704
```
My operating system archlinux with all latest updates. I have tried different kernel version 6.1, 6.3, 6.4 and different browser google chrome, chromium, firefox and also tried turn off/on enable_psr/enable_guc kernel module options. Kernel, browser versions and driver parameters do not affect on my problem.
My current system environment
archlinux with 6.1-lts/6.4 kernels
mesa 23.1.3
kde 5.27.6 plasmashell
wayland 1.22.0
There are not problem with other web-sites, applications, 2D and 3D.