Project 'drm/intel' was moved to 'drm/i915/kernel'. Please update any links and bookmarks that may still have the old path.
[CRASH][ROCm5.6][OpenCL][RDNA3] Machine Learning workload crashes the GPU
Brief summary of the problem:
System crash (black screen / no TTY with REISUB) running https://github.com/oobabooga/text-generation-webui OpenCL build. And https://github.com/easydiffusion/easydiffusion with Python ROCm.
Hardware & system description:
System:
Host: _HOST_ Kernel: 6.5.4-zen2-1-zen arch: x86_64 bits: 64 compiler: gcc
v: 13.2.1 Desktop: KDE Plasma v: 5.27.8 tk: Qt v: 5.15.10 wm: kwin_wayland
dm: 1: LightDM note: stopped 2: SDDM Distro: Arch Linux
CPU:
Info: 8-core model: AMD Ryzen 7 5800X bits: 64 type: MT MCP arch: Zen 3+
rev: 0 cache: L1: 512 KiB L2: 4 MiB L3: 32 MiB
Speed (MHz): avg: 3128 high: 3631 min/max: 2200/4850 boost: enabled cores:
1: 3531 2: 3586 3: 2879 4: 2200 5: 2880 6: 3173 7: 2879 8: 2879 9: 3631
10: 3319 11: 2896 12: 3618 13: 2879 14: 3591 15: 2880 16: 3233
bogomips: 121595
Flags: avx avx2 ht lm nx pae sse sse2 sse3 sse4_1 sse4_2 sse4a ssse3 svm
Graphics:
Device-1: AMD Navi 31 [Radeon RX 7900 XT/7900 XTX] vendor: XFX RX-79XMERCB9
driver: amdgpu v: kernel arch: RDNA-3 pcie: speed: 16 GT/s lanes: 16 ports:
active: DP-1 empty: DP-2,DP-3,HDMI-A-1 bus-ID: 2b:00.0 chip-ID: 1002:744c
Display: wayland server: X.org v: 1.21.1.8 with: Xwayland v: 23.2.1
compositor: kwin_wayland driver: X: loaded: amdgpu dri: radeonsi gpu: amdgpu
display-ID: 0
Monitor-1: DP-1 res: 2560x1440 size: N/A
API: EGL v: 1.5 platforms: device: 0 drv: radeonsi device: 1 drv: swrast
surfaceless: drv: radeonsi wayland: drv: radeonsi x11: drv: radeonsi
inactive: gbm
API: OpenGL v: 4.6 compat-v: 4.5 vendor: amd v: N/A glx-v: 1.4
direct-render: yes renderer: AMD Radeon RX 7900 XTX (navi31 LLVM 18.0.0 DRM
3.54 6.5.4-zen2-1-zen) device-ID: 1002:744c display-ID: :1.0
API: Vulkan v: 1.3.264 surfaces: xcb,xlib,wayland device: 0
type: discrete-gpu driver: mesa radv device-ID: 1002:744c
How to reproduce the issue:
Run afore mentioned workloads and generate things with Model Layers loaded in VRAM.
- Stable Diffusion model: SDXL (Hugging Face)
- Text-web-ui model: Mistral 7b
Attached files:
Log files (for system lockups / game freezes / crashes)
- Dmesg log (full log):crash_amdgpu_6.5.4-zen2-1-zen.tar.gz