`[gfxhub0] no-retry page fault` triggered by `AMD_TEST=testdmaperf` on gfx90c APU
Running AMD_TEST=testdmaperf glxinfo -B
on gfx90c
(Renoir iGPU on Ryzen 5700G) triggers a page fault on kernel 6.7.9
.
The page fault is triggered at always the exact same step in the test, i.e. VRAM->VRAM ,CS x2
and always at 4096KB
. The returned test value is always low (<100) before it dies.
Sample output (with other tests removed for brevity):
name of display: :0
DMA rate is in MB/s for each size. Slow cases are skipped and print 0.
Heap ,Method ,L2p,Wa, 512B, 1KB, 2KB, 4KB, 8KB, 16KB, 32KB, 64KB, 128KB, 256KB, 512KB, 1024KB, 2048KB, 4096KB, 8192KB, 16384KB, 32768KB, 65536KB,131072KB,
-----------,--------,---,--,--------,--------,--------,--------,--------,--------,--------,--------,--------,--------,--------,--------,--------,--------,--------,--------,--------,--------,--------,
VRAM->VRAM ,CP MC , , , 560 , 1100 , 2175 , 4200 , 7766 , 11583 , 15501 , 19623 , 21226 , 20851 , 21706 , 22168 , 22221 , 22110 , 22269 , 22275 , 22232 , 22302 , 22272 ,
VRAM->VRAM ,CP L2 ,Str, , 537 , 1053 , 2035 , 3792 , 7181 , 12591 , 16667 , 19873 , 21026 , 21581 , 21734 , 21125 , 21856 , 21910 , 21869 , 21948 , 21981 , 21986 , 21998 ,
VRAM->VRAM ,CP L2 ,LRU, , 977 , 1704 , 3421 , 6554 , 12038 , 20187 , 30941 , 42517 , 52632 , 59837 , 27343 , 21681 , 21543 , 21557 , 21802 , 21813 , 21602 , 21850 , 21739 ,
VRAM->VRAM ,CS x64 , , , 402 , 762 , 1542 , 3035 , 5740 , 10866 , 21522 , 39308 , 72801 , 103306 , 130719 , 142248 , 171086 , 182315 , 195265 , 211224 , 218299 , 222235 , 223389 ,
VRAM->VRAM ,CS x32 , , , 512 , 916 , 1846 , 3421 , 6660 , 13706 , 24548 , 45721 , 69329 , 96006 , 111932 , 126422 , 132961 , 140169 , 132831 , 130333 , 127769 , 126788 , 126387 ,
VRAM->VRAM ,CS x16 , , , 503 , 981 , 1959 , 3785 , 6994 , 13754 , 25510 , 39632 , 57000 , 68966 , 76628 , 81853 , 83181 , 71003 , 60683 , 55878 , 53099 , 52255 , 53820 ,
VRAM->VRAM ,CS x8 , , , 523 , 996 , 1989 , 3887 , 7505 , 13730 , 20253 , 32051 , 37707 , 40876 , 43156 , 43307 , 40243 , 37431 , 35967 , 35602 , 35126 , 35312 , 35273 ,
VRAM->VRAM ,CS x4 , , , 501 , 990 , 1981 , 3807 , 6975 , 10466 , 16067 , 19513 , 21320 , 21945 , 22272 , 22241 , 21660 , 21729 , 21653 , 21659 , 21679 , 21682 , 21664 ,
VRAM->VRAM ,CS x2 , , , 521 , 998 , 1969 , 3781 , 6741 , 11748 , 15960 , 19260 , 20761 , 22557 , 21433 , 21835 , 21829 , 50 ,
dmesg
:
[ 228.477070] amdgpu 0000:09:00.0: amdgpu: [gfxhub0] no-retry page fault (src_id:0 ring:24 vmid:3 pasid:32777, for process glxinfo pid 2426 thread glxinfo:cs0 pid 2427)
[ 228.477076] amdgpu 0000:09:00.0: amdgpu: in page starting at address 0x0000800101800000 from IH client 0x1b (UTCL2)
[ 228.477080] amdgpu 0000:09:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00301030
[ 228.477081] amdgpu 0000:09:00.0: amdgpu: Faulty UTCL2 client ID: TCP (0x8)
[ 228.477082] amdgpu 0000:09:00.0: amdgpu: MORE_FAULTS: 0x0
[ 228.477083] amdgpu 0000:09:00.0: amdgpu: WALKER_ERROR: 0x0
[ 228.477084] amdgpu 0000:09:00.0: amdgpu: PERMISSION_FAULTS: 0x3
[ 228.477085] amdgpu 0000:09:00.0: amdgpu: MAPPING_ERROR: 0x0
[ 228.477085] amdgpu 0000:09:00.0: amdgpu: RW: 0x0
On irc, @mareko said I should mention this is a trivial memcpy compute shader and that it works fine on other gfx9 chips.
Tested on kernel 6.7
, 6.7.8
and 6.7.9
with linux-firmware-20240220
and mesa 24.0.2
.
glxinfo -B
:
name of display: :0
display: :0 screen: 0
direct rendering: Yes
Extended renderer info (GLX_MESA_query_renderer):
Vendor: AMD (0x1002)
Device: AMD Radeon Graphics (radeonsi, renoir, LLVM 17.0.6, DRM 3.57, 6.7.9RMOD) (0x1638)
Version: 24.0.2
Accelerated: yes
Video memory: 512MB
Unified memory: no
Preferred profile: core (0x1)
Max core profile version: 4.6
Max compat profile version: 4.6
Max GLES1 profile version: 1.1
Max GLES[23] profile version: 3.2
Memory info (GL_ATI_meminfo):
VBO free memory - total: 92 MB, largest block: 92 MB
VBO free aux. memory - total: 15150 MB, largest block: 15150 MB
Texture free memory - total: 92 MB, largest block: 92 MB
Texture free aux. memory - total: 15150 MB, largest block: 15150 MB
Renderbuffer free memory - total: 92 MB, largest block: 92 MB
Renderbuffer free aux. memory - total: 15150 MB, largest block: 15150 MB
Memory info (GL_NVX_gpu_memory_info):
Dedicated video memory: 512 MB
Total available memory: 16209 MB
Currently available dedicated video memory: 92 MB
OpenGL vendor string: AMD
OpenGL renderer string: AMD Radeon Graphics (radeonsi, renoir, LLVM 17.0.6, DRM 3.57, 6.7.9RMOD)
OpenGL core profile version string: 4.6 (Core Profile) Mesa 24.0.2
OpenGL core profile shading language version string: 4.60
OpenGL core profile context flags: (none)
OpenGL core profile profile mask: core profile
OpenGL version string: 4.6 (Compatibility Profile) Mesa 24.0.2
OpenGL shading language version string: 4.60
OpenGL context flags: (none)
OpenGL profile mask: compatibility profile
OpenGL ES profile version string: OpenGL ES 3.2 Mesa 24.0.2
OpenGL ES profile shading language version string: OpenGL ES GLSL ES 3.20
$ sudo cat /sys/kernel/debug/dri/0/amdgpu_firmware_info
VCE feature version: 0, firmware version: 0x00000000
UVD feature version: 0, firmware version: 0x00000000
MC feature version: 0, firmware version: 0x00000000
ME feature version: 54, firmware version: 0x000000a7
PFP feature version: 54, firmware version: 0x000000c3
CE feature version: 54, firmware version: 0x00000050
RLC feature version: 1, firmware version: 0x0000003c
RLC SRLC feature version: 1, firmware version: 0x00000001
RLC SRLG feature version: 1, firmware version: 0x00000001
RLC SRLS feature version: 1, firmware version: 0x00000001
RLCP feature version: 0, firmware version: 0x00000000
RLCV feature version: 0, firmware version: 0x00000000
MEC feature version: 54, firmware version: 0x000001d7
IMU feature version: 0, firmware version: 0x00000000
SOS feature version: 0, firmware version: 0x00000000
ASD feature version: 0, firmware version: 0x210000c7
TA XGMI feature version: 0x00000000, firmware version: 0x00000000
TA RAS feature version: 0x00000000, firmware version: 0x00000000
TA HDCP feature version: 0x00000000, firmware version: 0x1700003c
TA DTM feature version: 0x00000000, firmware version: 0x12000016
TA RAP feature version: 0x00000000, firmware version: 0x00000000
TA SECUREDISPLAY feature version: 0x00000000, firmware version: 0x00000000
SMC feature version: 0, program: 0, firmware version: 0x00403d00 (64.61.0)
SDMA0 feature version: 41, firmware version: 0x00000028
VCN feature version: 0, firmware version: 0x06115000
DMCU feature version: 0, firmware version: 0x00000000
DMCUB feature version: 0, firmware version: 0x01010028
TOC feature version: 0, firmware version: 0x00000000
MES_KIQ feature version: 0, firmware version: 0x00000000
MES feature version: 0, firmware version: 0x00000000
VPE feature version: 0, firmware version: 0x00000000
VBIOS version: 113-CEZANNE-018