XFX RX 7900GRE GCVM_L2_PROTECTION_FAULT_STATUS when raytracing (RADV)
Brief summary of the problem:
Anytime I try to use RADV for Ray Tracing, I get the following error (journalctl | grep amdgpu
):
amdgpu 0000:0d:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:40 vmid:1 pasid:32779, for process RayTracer pid 8822 thread RayTracer pid 8822)
amdgpu 0000:0d:00.0: amdgpu: in page starting at address 0x0000800102dab000 from client 10
amdgpu 0000:0d:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00101031
amdgpu 0000:0d:00.0: amdgpu: Faulty UTCL2 client ID: TCP (0x8)
amdgpu 0000:0d:00.0: amdgpu: MORE_FAULTS: 0x1
amdgpu 0000:0d:00.0: amdgpu: WALKER_ERROR: 0x0
amdgpu 0000:0d:00.0: amdgpu: PERMISSION_FAULTS: 0x3
amdgpu 0000:0d:00.0: amdgpu: MAPPING_ERROR: 0x0
amdgpu 0000:0d:00.0: amdgpu: RW: 0x0
amdgpu 0000:0d:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:40 vmid:1 pasid:32779, for process RayTracer pid 8822 thread RayTracer pid 8822)
amdgpu 0000:0d:00.0: amdgpu: in page starting at address 0x0000800102da0000 from client 10
amdgpu 0000:0d:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00141051
amdgpu 0000:0d:00.0: amdgpu: Faulty UTCL2 client ID: TCP (0x8)
amdgpu 0000:0d:00.0: amdgpu: MORE_FAULTS: 0x1
amdgpu 0000:0d:00.0: amdgpu: WALKER_ERROR: 0x0
amdgpu 0000:0d:00.0: amdgpu: PERMISSION_FAULTS: 0x5
amdgpu 0000:0d:00.0: amdgpu: MAPPING_ERROR: 0x0
amdgpu 0000:0d:00.0: amdgpu: RW: 0x1
amdgpu 0000:0d:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:40 vmid:1 pasid:32779, for process RayTracer pid 8822 thread RayTracer pid 8822)
amdgpu 0000:0d:00.0: amdgpu: in page starting at address 0x0000800102dac000 from client 10
amdgpu 0000:0d:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00141051
amdgpu 0000:0d:00.0: amdgpu: Faulty UTCL2 client ID: TCP (0x8)
amdgpu 0000:0d:00.0: amdgpu: MORE_FAULTS: 0x1
amdgpu 0000:0d:00.0: amdgpu: WALKER_ERROR: 0x0
amdgpu 0000:0d:00.0: amdgpu: PERMISSION_FAULTS: 0x5
amdgpu 0000:0d:00.0: amdgpu: MAPPING_ERROR: 0x0
amdgpu 0000:0d:00.0: amdgpu: RW: 0x1
[drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, but soft recovered
[drm] amdgpu kernel modesetting enabled.
Those errors are repeated multiple times, until the program crashes, or gives up. For that specific log I used RayTracingInVulkan from https://github.com/GPSnoopy/RayTracingInVulkan But this behavior occurs in other programs (Mainly tested Cyberpunk 2077 and The Finals). They only happened when using RADV.
I have tested it on the newest version of mesa-git as well, and that seems to fix part of the problem, most likely due to this commit: mesa/mesa@b588cb29 Using mesa-git I am able to launch RayTracingInVulkan without issues, but Cyberpunk 2077 does not load its benchmark (it doesn't crash, but it loads at ~1fps until I quit it)
Hardware description:
- CPU:
Ryzen 9 5900x
- GPU:
Advanced Micro Devices, Inc. [AMD/ATI] Navi 31 [Radeon RX 7900 XT/7900 XTX/7900M]
Actually is:XFX Radeon RX 7900GRE
- System Memory:
32GB
- Display(s):
Display 1 (M27Q P): 2560x1440 @ 165Hz
Display 2 (HP M24f FHD): 1920x1080 @ 75Hz
Display 3 (HP VH240a): 1080x1920 @ 60Hz
- Type of Display Connection:
Display 1: DP
Display 2: HDMI
Display 3: HDMI to DP Adapter
System information:
- Distro name and Version:
EndeavourOS x86_64
- Kernel version:
6.7.6-zen1-2-zen
- AMD official driver version:
N/A
How to reproduce the issue:
- Ensure that RADV is current default
- Start any program that uses hardware raytracing
- Observe everything be sad and freeze
- Hope that it unfreezes and you can force quit the program, or the program force quit itself.