repeatable [gfxhub] page faults when starting LM Studio
When starting lmstudio v0.3.5 (build 9) I get a series of:
Dec 21 15:44:36 grover kernel: amdgpu 0000:03:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:169 vmid:0 pasid:0)
Dec 21 15:44:36 grover kernel: amdgpu 0000:03:00.0: amdgpu: in page starting at address 0x0000000000000000 from client 10
Dec 21 15:44:36 grover kernel: amdgpu 0000:03:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00040B52
Dec 21 15:44:36 grover kernel: amdgpu 0000:03:00.0: amdgpu: Faulty UTCL2 client ID: CPC (0x5)
Dec 21 15:44:36 grover kernel: amdgpu 0000:03:00.0: amdgpu: MORE_FAULTS: 0x0
Dec 21 15:44:36 grover kernel: amdgpu 0000:03:00.0: amdgpu: WALKER_ERROR: 0x1
Dec 21 15:44:36 grover kernel: amdgpu 0000:03:00.0: amdgpu: PERMISSION_FAULTS: 0x5
Dec 21 15:44:36 grover kernel: amdgpu 0000:03:00.0: amdgpu: MAPPING_ERROR: 0x1
Dec 21 15:44:36 grover kernel: amdgpu 0000:03:00.0: amdgpu: RW: 0x1
Dec 21 15:44:36 grover kernel: amdgpu 0000:03:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:153 vmid:0 pasid:0)
Dec 21 15:44:36 grover kernel: amdgpu 0000:03:00.0: amdgpu: in page starting at address 0x0000000000000000 from client 10
Dec 21 15:44:36 grover kernel: amdgpu 0000:03:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00000B33
Dec 21 15:44:36 grover kernel: amdgpu 0000:03:00.0: amdgpu: Faulty UTCL2 client ID: CPC (0x5)
Dec 21 15:44:36 grover kernel: amdgpu 0000:03:00.0: amdgpu: MORE_FAULTS: 0x1
Dec 21 15:44:36 grover kernel: amdgpu 0000:03:00.0: amdgpu: WALKER_ERROR: 0x1
Dec 21 15:44:36 grover kernel: amdgpu 0000:03:00.0: amdgpu: PERMISSION_FAULTS: 0x3
Dec 21 15:44:36 grover kernel: amdgpu 0000:03:00.0: amdgpu: MAPPING_ERROR: 0x1
Dec 21 15:44:36 grover kernel: amdgpu 0000:03:00.0: amdgpu: RW: 0x0
These errors should not be ocuring. Good news is even with them the application works.
The errors started appearing with kernel 6.12.6 @jesse.zhang and seem to be due to commit:
commit 438b39ac74e2a9dc0a5c9d653b7d8066877e86b1
Author: Jesse.zhang@amd.com <Jesse.zhang@amd.com>
Date: Thu Dec 5 17:41:26 2024 +0800
drm/amdkfd: pause autosuspend when creating pdd
When using MES creating a pdd will require talking to the GPU to
setup the relevant context. The code here forgot to wake up the GPU
in case it was in suspend, this causes KVM to EFAULT for passthrough
GPU for example. This issue can be masked if the GPU was woken up by
other things (e.g. opening the KMS node) first and have not yet gone to sleep.
v4: do the allocation of proc_ctx_bo in a lazy fashion
when the first queue is created in a process (Felix)
Signed-off-by: Jesse Zhang <jesse.zhang@amd.com>
Reviewed-by: Yunxiang Li <Yunxiang.Li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
- CPU: 7700 on a ASROCK B650M PG Riptide motherboard*-display\
- GPU: lshw -C display -numeric
*-display
description: VGA compatible controller
product: Navi 31 [Radeon RX 7900 XT/7900 XTX/7900 GRE/7900M] [1002:744C]
vendor: Advanced Micro Devices, Inc. [AMD/ATI] [1002]
physical id: 0
bus info: pci@0000:03:00.0
logical name: /dev/fb0
version: cc
width: 64 bits
clock: 33MHz
capabilities: pm pciexpress msi vga_controller bus_master cap_list rom fb
configuration: depth=32 driver=amdgpu latency=0 mode=1920x1080 resolution=2560,1440 visual=truecolor xres=1920 yres=1080
resources: iomemory:f00-eff iomemory:f80-f7f irq:83 memory:f000000000-f7ffffffff memory:f800000000-f80fffffff ioport:f000(size=256) memory:f6a00000-f6afffff memory:f6b00000-f6b1ffff
description:
- System Memory: 64G
- Display(s): kscreen-doctor -o
Output: 1 DP-1
enabled
connected
priority 1
DisplayPort
Modes: 1:2560x1440@60! 2:2560x1440@165* 3:2560x1440@120 4:1920x1200@60 5:1920x1080@165 6:1920x1080@120 7:1920x1080@120 8:1920x1080@60 9:1920x1080@60 10:1920x1080@60 11:1600x1200@60 12:1680x1050@60 13:1280x1024@75 14:1280x1024@60 15:1440x900@60 16:1280x800@60 17:1280x720@120 18:1280x720@120 19:1280x720@60 20:1280x720@60 21:1024x768@75 22:1024x768@70 23:1024x768@60 24:800x600@75 25:800x600@72 26:800x600@60 27:800x600@56 28:720x576@50 29:720x576@50 30:720x480@60 31:720x480@60 32:720x480@60 33:720x480@60 34:640x480@75 35:640x480@73 36:640x480@67 37:640x480@60 38:640x480@60 39:1600x1200@60 40:1280x1024@60 41:1024x768@60 42:1920x1200@60 43:1280x800@60 44:2560x1440@60 45:1920x1080@60 46:1600x900@60 47:1368x768@60 48:1280x720@60
Geometry: 0,0 2560x1440
Scale: 1
Rotation: 1
Overscan: 0
Vrr: Automatic
RgbRange: unknown
HDR: enabled
SDR brightness: 300 nits
SDR gamut wideness: 80%
Peak brightness: 436 nits
Max average brightness: 436 nits
Min brightness: 0.124 nits
Wide Color Gamut: enabled
ICC profile: none
Color profile source: EDID
Brightness control: supported, set to 80%
Output: 2 HDMI-A-1
enabled
connected
priority 2
HDMI
Modes: 49:1920x1080@60*! 50:1920x1080@60 51:1920x1080@60 52:1920x1080@50 53:1680x1050@60 54:1280x1024@75 55:1280x1024@60 56:1440x900@60 57:1280x960@60 58:1280x800@60 59:1152x864@75 60:1280x720@60 61:1280x720@60 62:1280x720@60 63:1280x720@50 64:1280x720@50 65:1440x576@50 66:1440x576@50 67:1024x768@75 68:1024x768@70 69:1024x768@60 70:1440x480@60 71:1440x480@60 72:1440x480@60 73:1440x480@60 74:832x624@75 75:800x600@75 76:800x600@72 77:800x600@60 78:800x600@56 79:720x576@50 80:720x576@50 81:720x576@50 82:720x480@60 83:720x480@60 84:720x480@60 85:720x480@60 86:720x480@60 87:640x480@75 88:640x480@73 89:640x480@67 90:640x480@60 91:640x480@60 92:640x480@60 93:720x400@70 94:1280x1024@60 95:1024x768@60 96:1280x800@60 97:1920x1080@60 98:1600x900@60 99:1368x768@60 100:1280x720@60
Geometry: 2560,261 1920x1080
Scale: 1
Rotation: 1
Overscan: 0
Vrr: incapable
RgbRange: unknown
HDR: incapable
Wide Color Gamut: incapable
ICC profile: none
Color profile source: EDID
Brightness control: supported, set to 80%
System information:
- Arch up to date as of 21 Dec 2024
- Kernel version: Linux grover 6.12.6-1-stable-git #1 (closed) SMP PREEMPT_DYNAMIC Sat, 21 Dec 2024 20:47:12 +0000 x86_64 GNU/Linux
- mesa: 24.3.1-3
How to reproduce the issue:
Build and install 6.12.6 using an arch 6.12 config
Install aur: lmstudio-beta 0.3.5 (Build 9)
start lmstudio
check logs for error
Edited by Ed Tomlinson