Oland [Radeon HD 8570 / R7 240/340 OEM]: GPU hang
On Dell OptiPlex 5055 with Oland [Radeon HD 8570 / R7 240/340 OEM] [1002:6611] (rev 87), with Linux 5.10.24, a user got a frozen display caused by a GPU hang.
[Fri Oct 22 15:22:29 2021] amdgpu 0000:05:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000c address=0xfffffff0c0 flags=0x0020]
[Fri Oct 22 15:22:29 2021] amdgpu 0000:05:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000c address=0xfffffff7c0 flags=0x0020]
[Fri Oct 22 15:22:29 2021] amdgpu 0000:05:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000c address=0xffffffe0c0 flags=0x0020]
[Fri Oct 22 15:22:29 2021] amdgpu 0000:05:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000c address=0xfffffffec0 flags=0x0020]
[Fri Oct 22 15:22:29 2021] amdgpu 0000:05:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000c address=0xffffffe5c0 flags=0x0020]
[Fri Oct 22 15:22:29 2021] amdgpu 0000:05:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000c address=0xffffffd0c0 flags=0x0020]
[Fri Oct 22 15:22:29 2021] amdgpu 0000:05:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000c address=0xffffffecc0 flags=0x0020]
[Fri Oct 22 15:22:29 2021] amdgpu 0000:05:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000c address=0xffffffd3c0 flags=0x0020]
[Fri Oct 22 15:22:29 2021] amdgpu 0000:05:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000c address=0xffffffc0c0 flags=0x0020]
[Fri Oct 22 15:22:29 2021] amdgpu 0000:05:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000c address=0xffffffdac0 flags=0x0020]
[Fri Oct 22 15:22:29 2021] AMD-Vi: Event logged [IO_PAGE_FAULT device=05:00.0 domain=0x000c address=0xffffffc1c0 flags=0x0020]
[Fri Oct 22 15:22:29 2021] AMD-Vi: Event logged [IO_PAGE_FAULT device=05:00.0 domain=0x000c address=0xffffffc8c0 flags=0x0020]
[Fri Oct 22 15:22:29 2021] AMD-Vi: Event logged [IO_PAGE_FAULT device=05:00.0 domain=0x000c address=0xffffffb0c0 flags=0x0020]
[Fri Oct 22 15:22:29 2021] AMD-Vi: Event logged [IO_PAGE_FAULT device=05:00.0 domain=0x000c address=0xffffffcfc0 flags=0x0020]
[Fri Oct 22 15:22:29 2021] AMD-Vi: Event logged [IO_PAGE_FAULT device=05:00.0 domain=0x000c address=0xffffffb6c0 flags=0x0020]
[Fri Oct 22 15:22:29 2021] AMD-Vi: Event logged [IO_PAGE_FAULT device=05:00.0 domain=0x000c address=0xffffffa0c0 flags=0x0020]
[Fri Oct 22 15:22:29 2021] AMD-Vi: Event logged [IO_PAGE_FAULT device=05:00.0 domain=0x000c address=0xffffffbdc0 flags=0x0020]
[Fri Oct 22 15:22:29 2021] AMD-Vi: Event logged [IO_PAGE_FAULT device=05:00.0 domain=0x000c address=0xffffffa4c0 flags=0x0020]
[Fri Oct 22 15:22:29 2021] AMD-Vi: Event logged [IO_PAGE_FAULT device=05:00.0 domain=0x000c address=0xffffff90c0 flags=0x0020]
[Fri Oct 22 15:22:29 2021] AMD-Vi: Event logged [IO_PAGE_FAULT device=05:00.0 domain=0x000c address=0xffffffabc0 flags=0x0020]
[Fri Oct 22 16:11:37 2021] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=295073040, emitted seq=295073042
[Fri Oct 22 16:11:37 2021] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process Xorg pid 625 thread Xorg:cs0 pid 661
[Fri Oct 22 16:11:37 2021] amdgpu 0000:05:00.0: amdgpu: GPU recovery disabled.
Any ideas, what can cause the page faults and the hang?
Linux 5.10.24 messages (dmesg
)
PS:
$ lspci -vvxxx -s 05:00.0
05:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Oland [Radeon HD 8570 / R7 240/340 OEM] (rev 87) (prog-if 00 [VGA controller])
Subsystem: Dell Oland [Radeon HD 8570 / R7 240/340 OEM]
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort+ <MAbort- >SERR- <PERR- INTx-
Latency: 0
Interrupt: pin A routed to IRQ 47
Region 0: Memory at e0000000 (64-bit, prefetchable) [size=256M]
Region 2: Memory at f0500000 (64-bit, non-prefetchable) [size=256K]
Region 4: I/O ports at 2000 [size=256]
Expansion ROM at f0560000 [disabled] [size=128K]
Capabilities: [48] Vendor Specific Information: Len=08 <?>
Capabilities: [50] Power Management version 3
Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1+,D2+,D3hot+,D3cold-)
Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [58] Express (v2) Legacy Endpoint, MSI 00
DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <4us, L1 unlimited
ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
MaxPayload 256 bytes, MaxReadReq 512 bytes
DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq+ AuxPwr- TransPend-
LnkCap: Port #0, Speed 8GT/s, Width x8, ASPM L0s L1, Exit Latency L0s <64ns, L1 <1us
ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 8GT/s (ok), Width x8 (ok)
TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Not Supported, TimeoutDis-, NROPrPrP-, LTR-
10BitTagComp-, 10BitTagReq-, OBFF Not Supported, ExtFmt-, EETLPPrefix-
EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
FRS-
AtomicOpsCap: 32bit- 64bit- 128bitCAS-
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
AtomicOpsCtl: ReqEn-
LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete+, EqualizationPhase1+
EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest-
Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+
Address: 00000000fee00000 Data: 0000
Capabilities: [100 v1] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
Capabilities: [150 v2] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
HeaderLog: 00000000 00000000 00000000 00000000
Capabilities: [200 v1] Resizable BAR <?>
Capabilities: [270 v1] Secondary PCI Express
LnkCtl3: LnkEquIntrruptEn-, PerformEqu-
LaneErrStat: 0
Kernel driver in use: amdgpu
Kernel modules: amdgpu
00: 02 10 11 66 07 04 10 10 87 00 00 03 00 00 80 00
10: 0c 00 00 e0 00 00 00 00 04 00 50 f0 00 00 00 00
20: 01 20 00 00 00 00 00 00 00 00 00 00 28 10 02 10
30: 00 00 fe ff 48 00 00 00 00 00 00 00 ff 01 00 00
40: 00 00 00 00 00 00 00 00 09 50 08 00 28 10 02 10
50: 01 58 03 76 00 00 00 00 10 a0 12 00 a1 8f 00 00
60: 30 29 09 00 83 0c 40 00 40 00 83 10 00 00 00 00
70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
80: 00 00 00 00 0e 00 00 00 03 00 1f 00 00 00 00 00
90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
a0: 05 00 81 00 00 00 e0 fe 00 00 00 00 00 00 00 00
b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Edited by Paul Menzel