Amdgpu locks up occasionally when running 3d applications
@hramrach
Submitted by Michal Suchánek Assigned to Default DRI bug account
Link to original bug (#104527)
Description
Created attachment 136597
kernel messages
Linux 4.14.0 libdrm 2.4.89 mesa 17.3.1 on Debian
01:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Baffin [Radeon RX 460] [1002:67ef] (rev cf)
After lockup I see this message:
[150509.194713] amdgpu 0000:01:00.0: GPU fault detected: 147 0x00004802
[150509.194718] amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x00000000
[150509.194720] amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0A048002
[150509.194722] amdgpu 0000:01:00.0: VM fault (0x02, vmid 5) at page 0, read from 'TC0' (0x54433000) (72)
but similar message earlier did not cause lockup
[112552.659698] amdgpu 0000:01:00.0: GPU fault detected: 147 0x07f04802
[112552.659702] amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x0003F8FE
[112552.659704] amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0A048002
[112552.659706] amdgpu 0000:01:00.0: VM fault (0x02, vmid 5) at page 260350, read from 'TC0' (0x54433000) (72)
Earlier versions of kernel+mesa would occasionally lock up displaying garbage randomly at any time. I have not seen that for a while but the card still occasionally locks up when running a 3D application. After lock up the card keeps showing static screen of something the application rendered and movable cursor. It seems to happen most often when there is some setup in progress like loading a new scene.
Attachment 136597, "kernel messages":
dmesg.txt