RX6800XT Crash, Black Screen, Reboot, No VGA Initialization with ROCm workloads
Brief summary of the problem:
Running a Stable Diffusion workload on my GPU via ROCm 5.6 causes my whole system to randomly crash to an unresponsive black screen, then reboot itself. It does reboot successfully, but the GPU stays stuck in an unrecoverable state until a force poweroff. This ONLY occurs on ROCm workloads. I have never had this happen during extensive and intensive gaming sessions/workloads.
Hardware description:
- CPU: Ryzen 7 7700X
- GPU: XFX Radeon RX 6800XT
- System Memory: G.Skill Trident Z5 NEO DDR5 6000 CL36-36-36-96 1.35V (EXPO is enabled)
- Display(s): LG 32GN650-B
- Type of Display Connection: DP
System information:
- Distro name and Version: Manjaro Linux 23.0.0
- Kernel version: 6.3.13-2-MANJARO
- Custom kernel: N/A
- AMD official driver version: N/A
How to reproduce the issue:
Run Stable Diffusion through Automatic1111 WebUI, crash can happen on first generation or the 50th (or even later), it's very random.
Log files (for system lockups / game freezes / crashes)
journalctl log from crash as a Gist for brevity. Logs only appear sometimes after a crash, sometimes they are completely silent and show no signs of anything being wrong. Luckily this capture did show some SMU errors at the end of the log.
Any help would be greatly appreciated. I do understand that my card is not technically supported by ROCm, despite being identified as GFX1030. But I feel as though this issue is still concerning, and likely not normal regardless.
Thank you for your time,
Walaryne