Navi 23 6700S pp_od_clk_voltage no output, mclk capped at 875 MHz, gpu fan speed always 255
Brief summary of the problem:
I have a number of issues with my 6700S GPU on Linux.
- Even with amdgpu.ppfeaturemask=0xffffffff in the kernel parameters, the output of pp_od_clk_voltage is always blank. (perhaps related: #913 (closed))
- The GPU fan speed is always 0 according to apps like psensor.
cat /sys/class/drm/card0/device/hwmon/hwmon2/pwm1
always gives an output of 255. (related: #1925 (closed)) - The memory clock cannot go above 875 MHz, when it should be capable of 2 GHz according to here (perhaps related: #1301 (closed), #2152 (closed))
Hardware description:
- CPU: AMD Ryzen 9 6900HS
- GPU:
lspci -nn | grep VGA
03:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 23 [Radeon RX 6650 XT] [1002:73ef] (rev c2)
07:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Rembrandt [Radeon 680M] [1002:1681] (rev c7)
- System Memory: 40 GB
- Display(s): Laptop display, 1440p, 120 Hz
- Type of Display Connection: n/a
System information:
- Distro name and Version: Opensuse Tumbleweed 20221028
- Kernel version: 6.0.5-1
- Custom kernel: n/a
- AMD official driver version:
sudo cat /sys/kernel/debug/dri/0/amdgpu_firmware_info
VCE feature version: 0, firmware version: 0x00000000
UVD feature version: 0, firmware version: 0x00000000
MC feature version: 0, firmware version: 0x00000000
ME feature version: 41, firmware version: 0x00000040
PFP feature version: 41, firmware version: 0x0000005b
CE feature version: 41, firmware version: 0x00000024
RLC feature version: 1, firmware version: 0x00000056
RLC SRLC feature version: 0, firmware version: 0x00000000
RLC SRLG feature version: 0, firmware version: 0x00000000
RLC SRLS feature version: 0, firmware version: 0x00000000
MEC feature version: 41, firmware version: 0x0000005e
MEC2 feature version: 41, firmware version: 0x0000005e
SOS feature version: 0, firmware version: 0x00230009
ASD feature version: 553648242, firmware version: 0x21000072
TA XGMI feature version: 0x00000000, firmware version: 0x00000000
TA RAS feature version: 0x00000000, firmware version: 0x00000000
TA HDCP feature version: 0x00000000, firmware version: 0x17000028
TA DTM feature version: 0x00000000, firmware version: 0x1200000e
TA RAP feature version: 0x00000000, firmware version: 0x0700000e
TA SECUREDISPLAY feature version: 0x00000000, firmware version: 0x00000000
SMC feature version: 0, program: 0, firmware version: 0x003b2800 (59.40.0)
SDMA0 feature version: 52, firmware version: 0x0000004b
SDMA1 feature version: 52, firmware version: 0x0000004b
VCN feature version: 0, firmware version: 0x0211200e
DMCU feature version: 0, firmware version: 0x00000000
DMCUB feature version: 0, firmware version: 0x0202000c
TOC feature version: 0, firmware version: 0x00000000
VBIOS version: SWBRT91917.001
How to reproduce the issue:
Run sudo rocm-smi -d 0 -a
to see supported memory clock frequencies. Run watch -n 1 cat /sys/class/drm/card0/device/hwmon/hwmon2/pwm1
to see if any variation in reported fan speed. Run watch -n 1 cat pp_dpm_mclk
to watch memory clock variation. The laptop is the Asus Zephyrus G14 2022, BIOS 315.
Attached files:
Log files
Here is the output of sudo rocm-smi -d 0 -a
-
======================= ROCm System Management Interface =======================
========================= Version of System Component ==========================
Driver version: 6.0.5-1-default
================================================================================
====================================== ID ======================================
GPU[0] : GPU ID: 0x73ef
================================================================================
================================== Unique ID ===================================
GPU[0] : Unique ID: N/A
================================================================================
==================================== VBIOS =====================================
GPU[0] : VBIOS version: SWBRT91917.001
================================================================================
================================= Temperature ==================================
GPU[0] : Temperature (Sensor edge) (C): 48.0
GPU[0] : Temperature (Sensor junction) (C): 48.0
GPU[0] : Temperature (Sensor memory) (C): 46.0
================================================================================
========================== Current clock frequencies ===========================
GPU[0] : dcefclk clock level: 1: (480Mhz)
GPU[0] : fclk clock level: 1: (1148Mhz)
GPU[0] : mclk clock level: 0: (96Mhz)
GPU[0] : sclk clock level: 0: (0Mhz)
GPU[0] : socclk clock level: 1: (738Mhz)
GPU[0] : pcie clock level: 1 (16.0GT/s x8)
================================================================================
============================== Current Fan Metric ==============================
GPU[0] : Fan Level: 255 (100%)
GPU[0] : Fan RPM: 0
================================================================================
============================ Show Performance Level ============================
GPU[0] : Performance Level: auto
================================================================================
=============================== OverDrive Level ================================
GPU[0] : GPU OverDrive value (%): 0
================================================================================
=============================== OverDrive Level ================================
GPU[0] : GPU Memory OverDrive value (%): 0
================================================================================
================================== Power Cap ===================================
GPU[0] : Max Graphics Package Power (W): 100.0
================================================================================
============================= Show Power Profiles ==============================
GPU[0] : 1. Available power profile (#1 of 7): CUSTOM
GPU[0] : 2. Available power profile (#2 of 7): VIDEO
GPU[0] : 3. Available power profile (#3 of 7): POWER SAVING
GPU[0] : 4. Available power profile (#4 of 7): COMPUTE
GPU[0] : 5. Available power profile (#5 of 7): VR
GPU[0] : 6. Available power profile (#6 of 7): 3D FULL SCREEN
GPU[0] : 7. Available power profile (#7 of 7): BOOTUP DEFAULT*
================================================================================
============================== Power Consumption ===============================
GPU[0] : Average Graphics Package Power (W): 4.0
================================================================================
========================= Supported clock frequencies ==========================
GPU[0] : Supported dcefclk frequencies on GPU0
GPU[0] : 0: 417Mhz
GPU[0] : 1: 480Mhz *
GPU[0] : 2: 1200Mhz
GPU[0] :
GPU[0] : Supported fclk frequencies on GPU0
GPU[0] : 0: 500Mhz
GPU[0] : 1: 1148Mhz *
GPU[0] : 2: 1801Mhz
GPU[0] :
GPU[0] : Supported mclk frequencies on GPU0
GPU[0] : 0: 96Mhz *
GPU[0] : 1: 541Mhz
GPU[0] : 2: 675Mhz
GPU[0] : 3: 875Mhz
GPU[0] :
GPU[0] : Supported sclk frequencies on GPU0
GPU[0] : 0: 0Mhz *
GPU[0] : 1: 0Mhz
GPU[0] :
GPU[0] : Supported socclk frequencies on GPU0
GPU[0] : 0: 417Mhz
GPU[0] : 1: 738Mhz *
GPU[0] : 2: 1200Mhz
GPU[0] :
GPU[0] : Supported PCIe frequencies on GPU0
GPU[0] : 0: 2.5GT/s x1
GPU[0] : 1: 16.0GT/s x8 *
GPU[0] :
--------------------------------------------------------------------------------
================================================================================
============================== % time GPU is busy ==============================
GPU[0] : GPU use (%): 0
================================================================================
============================== Current Memory Use ==============================
GPU[0] : GPU memory use (%): 0
GPU[0] : Memory Activity: N/A
================================================================================
================================ Memory Vendor =================================
GPU[0] : GPU memory vendor: samsung
================================================================================
============================= PCIe Replay Counter ==============================
GPU[0] : PCIe Replay Count: 0
================================================================================
================================ Serial Number =================================
GPU[0] : Serial Number: N/A
================================================================================
================================ KFD Processes =================================
KFD process information:
PID PROCESS NAME GPU(s) VRAM USED SDMA USED CU OCCUPANCY
21288 AppRun.wrapped 0 0 0 0
================================================================================
============================= GPUs Indexed by PID ==============================
PID 21288 is using 0 DRM device(s)
================================================================================
================== GPU Memory clock frequencies and voltages ===================
GPU[0] : Requested function is not implemented on this setup
================================================================================
=============================== Current voltage ================================
GPU[0] : Voltage (mV): 6
================================================================================
================================== PCI Bus ID ==================================
GPU[0] : PCI Bus: 0000:03:00.0
================================================================================
============================= Firmware Information =============================
GPU[0] : ASD firmware version: 0x21000072
GPU[0] : CE firmware version: 36
GPU[0] : DMCU firmware version: 0
GPU[0] : MC firmware version: 0
GPU[0] : ME firmware version: 64
GPU[0] : MEC firmware version: 94
GPU[0] : MEC2 firmware version: 94
GPU[0] : PFP firmware version: 91
GPU[0] : RLC firmware version: 86
GPU[0] : RLC SRLC firmware version: 0
GPU[0] : RLC SRLG firmware version: 0
GPU[0] : RLC SRLS firmware version: 0
GPU[0] : SDMA firmware version: 75
GPU[0] : SDMA2 firmware version: 75
GPU[0] : SMC firmware version: 00.59.40.00
GPU[0] : SOS firmware version: 0x00230009
GPU[0] : TA RAS firmware version: 00.00.00.00
GPU[0] : TA XGMI firmware version: 00.00.00.00
GPU[0] : UVD firmware version: 0x00000000
GPU[0] : VCE firmware version: 0x00000000
GPU[0] : VCN firmware version: 0x0211200e
================================================================================
================================= Product Info =================================
GPU[0] : Card series: Navi 23 [Radeon RX 6650 XT]
GPU[0] : Card model: 0x1dec
GPU[0] : Card vendor: Advanced Micro Devices, Inc. [AMD/ATI]
Traceback (most recent call last):
File "/usr/bin/rocm-smi", line 3234, in <module>
showProductName(deviceList)
File "/usr/bin/rocm-smi", line 2056, in showProductName
device_sku = vbios.value.decode().split('-')[1][:6]
IndexError: list index out of range
There may also be an issue with my use/implementation of amdgpu.ppfeaturemask=0xffffffff
, as sudo sysctl -a
does not show it alongside other kernel parameters. However, printf "0x%08x\n" $(cat /sys/module/amdgpu/parameters/ppfeaturemask)
returns 0xffffffff
as described here. Would I have to re-run the kernel parameters update after each kernel update?
Link to related issue at the ROCm Github.