Blender Cycles+HIP - using rendered viewport crashes system/GPU on Linux with RX 6900 XT GPU
Brief summary of the problem:
When using HIP(GPU Interface) on RX6900XT together with Cycles GPU Rendering sometimes(see steps to reproduce below) my entire system freezes when using the rendered viewport, and I have to hard-reset the PC. Ctrl+alt+F3 doesn't bring up a login prompt and instead shows a black screen with a "_" in the top left making it seem like the system crashed with no way to recover.
Kernel messages from journalctl suggest amdgpu (kernel module?) issues (see attached) The crash is not always immediate and might instead only happen after using the rendered viewport for a while or switching back and forth between textured and rendered a few times. Rendering the scene with F12 works fine and I have observed no other crashes as long as I don't use the rendered viewport.
I have created a ticket on the blender bug-tracker and it seems to be a kernel/driver issue and not blender https://developer.blender.org/T100353
Hardware description:
- CPU: AMD Ryzen 7 5800X (16) @ 3.800GHz
- GPU: 0c:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 21 [Radeon RX 6800/6800 XT / 6900 XT] [1002:73bf] (rev c0) (=> RX 6900 XT)
- System Memory: 3200MHz 32GB
- Display(s): 1) Acer Monitor with 2560x1440 resoulution and 75Hz refresh rate; 2) Samsung Monitor with 2560x1440 resolution and 144hz refresh rate
- Type of Display Connection: Both displays connected with Display port to the GPU
System information:
-
Distro name and Version: Fedora Release 36 Workstation
-
Kernel version: Linux linuxjoni02 5.18.19-200.fc36.x86_64 #1 (closed) SMP PREEMPT_DYNAMIC Sun Aug 21 15:52:59 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
-
Custom kernel: N/A (Official Fedora kernel without modifications)
-
AMD official driver version: AMD Website version 22.20.50200 RHEL/CentOS 9
-
relevant installed packages: amdgpu-install.noarch 22.20.50200-1438747.el9
hip-runtime-amd.x86_64 5.2.21153.50203-109.el7
libdrm-amdgpu.x86_64 1:2.4.110.50200-1438747.el9
libdrm-amdgpu-common.noarch 1.0.0.50200-1438747.el9
xorg-x11-drv-amdgpu.x86_64 22.0.0-1.fc36
amdgpu.x86_64 22.20.50200-1438747.el9
amdgpu-core.noarch 22.20.50200-1438747.el9
amdgpu-dkms.noarch 1:5.16.9.22.20.50200-1438747.el9
amdgpu-dkms-firmware.noarch 1:5.16.9.22.20.50200-1438747.el9
amdgpu-doc.noarch 22.20-1438747.el9
amdgpu-lib.x86_64 22.20.50200-1438747.el9
libdrm-amdgpu-devel.x86_64 1:2.4.110.50200-1438747.el9
libwayland-amdgpu-client.x86_64 1.20.0.50200-1438747.el9
libwayland-amdgpu-cursor.x86_64 1.20.0.50200-1438747.el9
libwayland-amdgpu-egl.x86_64 1.20.0.50200-1438747.el9
libwayland-amdgpu-server.x86_64 1.20.0.50200-1438747.el9
llvm-amdgpu.x86_64 1:14.0.50200-1438747.el9
llvm-amdgpu-devel.x86_64 1:14.0.50200-1438747.el9
llvm-amdgpu-libs.x86_64 1:14.0.50200-1438747.el9
llvm-amdgpu-static.x86_64 1:14.0.50200-1438747.el9
llvm140-amdgpu.x86_64 1:14.0.50200-1438747.el9
llvm140-amdgpu-devel.x86_64 1:14.0.50200-1438747.el9
mesa-amdgpu-dri-drivers.x86_64 1:22.1.0.50200-1438747.el9
mesa-amdgpu-filesystem.x86_64 1:22.1.0.50200-1438747.el9
mesa-amdgpu-libEGL.x86_64 1:22.1.0.50200-1438747.el9
mesa-amdgpu-libEGL-devel.x86_64 1:22.1.0.50200-1438747.el9
mesa-amdgpu-libGL.x86_64 1:22.1.0.50200-1438747.el9
mesa-amdgpu-libGL-devel.x86_64 1:22.1.0.50200-1438747.el9
mesa-amdgpu-libgbm.x86_64 1:22.1.0.50200-1438747.el9
mesa-amdgpu-libgbm-devel.x86_64 1:22.1.0.50200-1438747.el9
mesa-amdgpu-libglapi.x86_64 1:22.1.0.50200-1438747.el9
mesa-amdgpu-libxatracker.x86_64 1:22.1.0.50200-1438747.el9
mesa-amdgpu-libxatracker-devel.x86_64 1:22.1.0.50200-1438747.el9
mesa-amdgpu-vdpau-drivers.x86_64 1:22.1.0.50200-1438747.el9
smi-lib-amdgpu.x86_64 22.20-1438747.el9
smi-lib-amdgpu-devel.x86_64 22.20-1438747.el9
vulkan-amdgpu.x86_64 22.20-1438747.el9
wayland-amdgpu-devel.x86_64 1.20.0.50200-1438747.el9
wayland-amdgpu-doc.noarch 1.20.0.50200-1438747.el9
wayland-protocols-amdgpu-devel.noarch 1.25.50200-1438747.el9
xorg-x11-amdgpu-drv-amdgpu.x86_64 1:24.1.0-1438747.el9
blender version is 3.2.2
How to reproduce the issue:
I can reliably reproduce the issue by following these steps
- Open up two Blender instances with the attached .blend File loaded and have the viewport running in rendered mode
- move the viewport in the first instance to get it to "refresh"/re-render
- while the first instance is still working - move the viewport in the second instance - for me the System/GPU freeze/crash happens right there.
Attached files:
Test .blend file to reproduce the issue: test_v2.blend
Screenshots/video files
Screenshot right before step 3 of the reproduction steps:
Log files (for system lockups / game freezes / crashes)
-
Dmesg log (full log): journalctl log from boot to crash: dmesg.txt generated via: #> journalctl --no-hostname -k -b -1 > dmesg.txt
-
Xorg log: Xorg.0.log (grabbed from /var/log/ - I'm using KDE Wayland, so not sure if this log is the right one)