Commits · c38fe1c3c73c3bd2b8e43f4a6dd28f41a1b98bae · Alex Deucher / linux

Mar 08, 2025

drm/amd/display: Implement PCON regulated autonomous mode handling · c38fe1c3

George Shen authored 1 month ago and

Tom Chung committed 3 weeks ago


[Why/How]
DP spec has been updated recently to make regulated autonomous mode more
well-defined. In case any PCON vendors choose to implement regulated
autonomous mode in the future, pre-emptively add handling for the
regulated autonomous mode based on current spec.

Reviewed-by: Wenjing Liu <wenjing.liu@amd.com>
Signed-off-by: George Shen <george.shen@amd.com>
Signed-off-by: Tom Chung <chiahsuan.chung@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>

c38fe1c3

drm/amd/display: not abort link train when bw is low · e53db51c

Peichen Huang authored 1 month ago and

Tom Chung committed 3 weeks ago


[WHY]
DP tunneling should not abort link train even bandwidth become
too low after downgrade. Otherwise, it would fail compliance test.

[HOW}
Do link train with downgrade settings even bandwidth is not enough

Reviewed-by: Cruise Hung <cruise.hung@amd.com>
Reviewed-by: Meenakshikumar Somasundaram <meenakshikumar.somasundaram@amd.com>
Signed-off-by: Peichen Huang <PeiChen.Huang@amd.com>
Signed-off-by: Tom Chung <chiahsuan.chung@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>

e53db51c

drm/amd/display: Do not enable replay when vtotal update is pending. · d4365abc

Wang Danny authored 1 month ago and

Tom Chung committed 3 weeks ago


[Why&How]
Vtotal is not applied to HW when handling vsync interrupt.
Make sure vtotal is aligned before enable replay.

Reviewed-by: Anthony Koo <anthony.koo@amd.com>
Reviewed-by: Robin Chen <robin.chen@amd.com>
Signed-off-by: Danny Wang <danny.wang@amd.com>
Signed-off-by: Zhongwei Zhang <Zhongwei.Zhang@amd.com>
Signed-off-by: Tom Chung <chiahsuan.chung@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>

d4365abc

drm/amd/display: Add and use new dm_prepare_suspend() callback · d5a80abb

Mario Limonciello authored 1 month ago and

Tom Chung committed 3 weeks ago


[Why]
The displays currently don't get turned off until after other IP blocks
have been suspended.  However turning off the displays first gives a
very visible response that the system is on it's way down.

[How]
Turn off displays in a prepare_suspend() callback instead when possible.
This will help for suspend and hibernate sequences.
The shutdown sequence however will not call prepare() so check whether
the state has been already saved to decide what to do.

Acked-by: Wayne Lin <Wayne.Lin@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Tom Chung <chiahsuan.chung@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>

d5a80abb

drm/amd/display: Restore correct backlight brightness after a GPU reset · 9ef8d342

Mario Limonciello authored 1 month ago and

Tom Chung committed 3 weeks ago


[Why]
GPU reset will attempt to restore cached state, but brightness doesn't
get restored. It will come back at 100% brightness, but userspace thinks
it's the previous value.

[How]
When running resume sequence if GPU is in reset restore brightness
to previous value.

Acked-by: Wayne Lin <Wayne.Lin@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Tom Chung <chiahsuan.chung@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>

9ef8d342

drm/amd/display: fix default brightness · 5032eea8

Mario Limonciello authored 1 month ago and

Tom Chung committed 3 weeks ago


[Why]
To avoid flickering during boot default brightness level set by BIOS
should be maintained for as much of the boot as feasible.
commit 2fe87f54 ("drm/amd/display: Set default brightness according
to ACPI") attempted to set the right levels for AC vs DC, but brightness
still got reset to maximum level in initialization code for
setup_backlight_device().

[How]
Remove the hardcoded initialization in setup_backlight_device() and
instead program brightness value to match BIOS (AC or DC).  This avoids a
brightness flicker from kernel changing the value.  Userspace may however
still change it during boot.

Fixes: 2fe87f54 ("drm/amd/display: Set default brightness according to ACPI")

Acked-by: Wayne Lin <Wayne.Lin@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Tom Chung <chiahsuan.chung@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>

5032eea8

drm/amd/display: Add more debug data to dmub_srv · 2d312202

Joshua Aberback authored 1 month ago and

Tom Chung committed 3 weeks ago


[Why]
When analyzing some crash dumps, not all of the expected DMUB info was
available, so we want to add in-object storage for this data.

[How]
 - dmub_srv_debug (renamed to dmub_timeout_info) is already a member of
dmub_diagnostic_data, therefore keep a dmub_diagnostic_data directly in
dmub_srv
 - use dmub_srv->debug when collecting diagnostic info instead of stack
object to allow for easy inspection in crash dumps

Reviewed-by: Alvin Lee <alvin.lee2@amd.com>
Signed-off-by: Joshua Aberback <joshua.aberback@amd.com>
Signed-off-by: Tom Chung <chiahsuan.chung@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>

2d312202

drm/amd/display: Disable unneeded hpd interrupts during dm_init · c279ebb6

Leo Li authored 1 month ago and

Tom Chung committed 3 weeks ago


[Why]

It seems HPD interrupts are enabled by default for all connectors, even
if the hpd source isn't valid. An eDP for example, does not have a valid
hpd source (but does have a valid hpdrx source; see construct_phy()).
Thus, eDPs should have their hpd interrupt disabled.

In the past, this wasn't really an issue. Although the driver gets
interrupted, then acks by writing to hw registers, there weren't any
subscribed handlers that did anything meaningful (see
register_hpd_handlers()).

But things changed with the introduction of IPS. s2idle requires that
the driver allows IPS for DMUB fw to put hw to sleep. Since register
access requires hw to be awake, the driver will block IPS entry to do
so. And no IPS means no hw sleep during s2idle.

This was the observation on DCN35 systems with an eDP. During suspend,
the eDP toggled its hpd pin as part of the panel power down sequence.
The driver was then interrupted, and acked by writing to registers,
blocking IPS entry.

[How]

Since DC marks eDP connections as having invalid hpd sources (see
construct_phy()), DM should disable them at the hw level. Do so in
amdgpu_dm_hpd_init() by disabling all hpd ints first, then selectively
enabling ones for connectors that have valid hpd sources.

Cc: Mario Limonciello <mario.limonciello@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Leo Li <sunpeng.li@amd.com>
Signed-off-by: Tom Chung <chiahsuan.chung@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>

c279ebb6

drm/amd/display: Fix incorrect DPCD configs while Replay/PSR switch · 39687c08

Leon Huang authored 1 month ago and

Tom Chung committed 3 weeks ago


[Why]
When switching between PSR/Replay,
the DPCD config of previous mode is not cleared,
resulting in unexpected behavior in TCON.

[How]
Initialize the DPCD in setup function

Reviewed-by: Robin Chen <robin.chen@amd.com>
Signed-off-by: Leon Huang <Leon.Huang1@amd.com>
Signed-off-by: Tom Chung <chiahsuan.chung@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>

39687c08

Mar 07, 2025

drm/amdkfd: Add pm_config_dequeue_wait_counts API · 98a5af81

Harish Kasiviswanathan authored 1 month ago


Update pm_update_grace_period() to more cleaner
pm_config_dequeue_wait_counts(). Previously, grace_period variable was
overloaded as a variable and a macro, making it inflexible to configure
additional dequeue wait times.

pm_config_dequeue_wait_counts() now takes in a cmd / variable. This
allows flexibility to update different dequeue wait times.

Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Reviewed-by: : Jonathan Kim <jonathan.kim@amd.com>

98a5af81

drm/amdgpu/vcn: fix idle work handler for VCN 2.5 · ca602d1c

Alex Deucher authored 3 weeks ago


VCN 2.5 uses the PG callback to enable VCN DPM which is
a global state.  As such, we need to make sure all instances
are in the same state.

v2: switch to a ref count (Lijo)
v3: switch to its own idle work handler
v4: fix logic in DPG handling

Fixes: 4ce4fe27 ("drm/amdgpu/vcn: use per instance callbacks for idle work handler")
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

ca602d1c

drm/amd/display: allow 256B DCC max compressed block sizes on gfx12 · 4b52c3e6

Marek Olšák authored 3 weeks ago


The hw supports it.

Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>

4b52c3e6

drm/amd: Keep display off while going into S4 · f54ff4ad

Mario Limonciello authored 3 weeks ago


When userspace invokes S4 the flow is:

1) amdgpu_pmops_prepare()
2) amdgpu_pmops_freeze()
3) Create hibernation image
4) amdgpu_pmops_thaw()
5) Write out image to disk
6) Turn off system

Then on resume amdgpu_pmops_restore() is called.

This flow has a problem that because amdgpu_pmops_thaw() is called
it will call amdgpu_device_resume() which will resume all of the GPU.

This includes turning the display hardware back on and discovering
connectors again.

This is an unexpected experience for the display to turn back on.
Adjust the flow so that during the S4 sequence display hardware is
not turned back on.

Reported-by: Xaver Hugl <xaver.hugl@gmail.com>
Closes: drm/amd#2038


Cc: Muhammad Usama Anjum <usama.anjum@collabora.com>
Tested-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Harry Wentland <harry.wentland@amd.com>
Link: https://lore.kernel.org/r/20250306185124.44780-1-mario.limonciello@amd.com


Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>

f54ff4ad

drm/amd/amdgpu: Add missing GC 11.5.0 register · 1421651d

Tom Denis authored 3 weeks ago


Adds register needed for debugging purposes.

Signed-off-by: Tom St Denis <tom.stdenis@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>

1421651d

drm/amd/amdgpu: Add missing GC 11.5.0 register · 9c800a39

Tom Denis authored 3 weeks ago


Adds register needed for debugging purposes.

Signed-off-by: Tom St Denis <tom.stdenis@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>

9c800a39

drm/amdkfd: Add support for more per-process flag · ebd70148

Harish Kasiviswanathan authored 2 months ago


Add support for more per-process flags starting with option to configure
MFMA precision for gfx 9.5

v2: Change flag name to KFD_PROC_FLAG_MFMA_HIGH_PRECISION
    Remove unused else condition
v3: Bump the KFD API version
v4: Missed SH_MEM_CONFIG__PRECISION_MODE__SHIFT define. Added it.

Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Reviewed-by: Amber Lin <Amber.Lin@amd.com>

ebd70148

drm/amdkfd: Set per-process flags only once for gfx9/10/11/12 · 1a9dbc31

Harish Kasiviswanathan authored 2 months ago


Define set_cache_memory_policy() for these asics and move all static
changes from update_qpd() which is called each time a queue is created
to set_cache_memory_policy() which is called once during process
initialization

Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Reviewed-by: Amber Lin <Amber.Lin@amd.com>

1a9dbc31

drm/amdkfd: clear F8_MODE for gfx950 · e02e8f39

Alex Sierra authored 10 months ago


Default F8_MODE should be OCP format on gfx950.

Signed-off-by: Alex Sierra <alex.sierra@amd.com>
Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Amber Lin <Amber.Lin@amd.com>

e02e8f39

Mar 06, 2025

drm/amdkfd: Set per-process flags only once cik/vi · 2c07548f

Harish Kasiviswanathan authored 2 months ago


Set per-process static sh_mem config only once during process
initialization. Move all static changes from update_qpd() which is
called each time a queue is created to set_cache_memory_policy() which
is called once during process initialization.

set_cache_memory_policy() is currently defined only for cik and vi
family. So this commit only focuses on these two. A separate commit will
address other asics.

Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Reviewed-by: Amber Lin <Amber.Lin@amd.com>

2c07548f

drm/amdgpu: add defines for pin_offsets in DCE8 · 3b1e3d67

Alexandre Demers authored 3 weeks ago


Define pin_offsets values in the same way it is done in DCE8

Signed-off-by: Alexandre Demers <alexandre.f.demers@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

3b1e3d67

drm/amdgpu: Fix annotation for dce_v6_0_line_buffer_adjust function · f5947985

SRINIVASAN SHANMUGAM authored 3 weeks ago

Updated description for the 'other_mode' parameter. This parameter is
used to determine the display mode of another display controller that
may be sharing the line buffer.

Cc: Ken Wang <Qingqing.Wang@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>

f5947985

drm/amdgpu: handle amdgpu_cgs_create_device() errors in amd_powerplay_create() · e72a794e

Wentao Liang authored 3 weeks ago


Add error handling to propagate amdgpu_cgs_create_device() failures
to the caller. When amdgpu_cgs_create_device() fails, release hwmgr
and return -ENOMEM to prevent null pointer dereference.

[v1]->[v2]: Change error code from -EINVAL to -ENOMEM. Free hwmgr.

Signed-off-by: Wentao Liang <vulab@iscas.ac.cn>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

e72a794e

drm/amd/display: fix missing .is_two_pixels_per_container · 55706724

Aliaksei Urbanski authored 3 weeks ago


Starting from 6.11, AMDGPU driver, while being loaded with amdgpu.dc=1,
due to lack of .is_two_pixels_per_container function in dce60_tg_funcs,
causes a NULL pointer dereference on PCs with old GPUs, such as R9 280X.

So this fix adds missing .is_two_pixels_per_container to dce60_tg_funcs.

Reported-by: Rosen Penev <rosenp@gmail.com>
Closes: drm/amd#3942


Fixes: e6a901a0 ("drm/amd/display: use even ODM slice width for two pixels per container")
Cc: <stable@vger.kernel.org> # 6.11+
Signed-off-by: Aliaksei Urbanski <aliaksei.urbanski@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

55706724

drm/amdgpu/display: Allow DCC for video formats on GFX12 · 9df41489

David Rosca authored 1 month ago


We advertise DCC as supported for NV12/P010 formats on GFX12,
but it would fail on this check on atomic commit.

Signed-off-by: David Rosca <david.rosca@amd.com>
Reviewed-by: Ruijing Dong <ruijing.dong@amd.com>

9df41489

drm/amdgpu: Use unique CPER record id across devices · 63b329ea

Xiang Liu authored 3 weeks ago


Encode socket id to CPER record id to be unique across devices.

v2: add pointer check for adev->smuio.funcs->get_socket_id
v2: set 0 if adev->smuio.funcs->get_socket_id is NULL

Signed-off-by: Xiang Liu <xiang.liu@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>

63b329ea

drm/amdgpu: fix the gb_addr_config_fields init value mismatch · 64cda756

Shiwu Zhang authored 3 weeks ago


For gfx_v9_4_3 specifically, before regGB_ADDR_CONFIG is overwritten
in gfx hw_init it is read out to popluate the gb_addr_config_fields
in the sw_init stage, which causes mismatch.

Fix it by using the golden value in sw_init as well.

v2: This is a driver-set golden reg and keep as it is (Lijo)

Signed-off-by: Shiwu Zhang <shiwu.zhang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>

64cda756

drm/amdgpu: retire ip init code specific for A0 rev · 582ff770

Shiwu Zhang authored 3 weeks ago


For aqua_vanjaram, A0 HW is retired so remove the code
specific for it in gfx ip init.

Signed-off-by: Shiwu Zhang <shiwu.zhang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>

582ff770

drm/amdgpu: increase RAS bad page threshold · 86a0ad69

Tao Zhou authored 3 weeks ago


For default policy, driver will issue an RMA event when the number of
bad pages is greater than 8 physical rows, rather than reaches 8
physical rows, don't rely on threshold configurable parameters in
default mode.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>

86a0ad69

drm/amdgpu: Fix missing drain retry fault the last entry · 4de8d6dc

Emily Deng authored 3 weeks ago


While the entry get in svm_range_unmap_from_cpu is the last entry, and
the entry is page fault, it also need to be dropped. So for equal case,
it also need to be dropped.

v2:
Only modify the svm_range_restore_pages.

Signed-off-by: Emily Deng <Emily.Deng@amd.com>
Reviewed-by: Xiaogang <Chen&lt;xiaogang.chen@amd.com>

4de8d6dc

Mar 05, 2025

drm/amdgpu: Do not set power brake sequence for Aldebaran SRIOV · e326a12b

Victor Lu authored 1 month ago


Aldebaran SRIOV VF cannot access the power brake feature regs.
The accesses can be skipped to avoid a dmesg warning.

v2: Remove redundant asic type check

Signed-off-by: Victor Lu <victorchengchi.lu@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>

e326a12b

drm/amdkfd: remove unused debug gws support status variable · effe811c

jokim-amd authored 1 month ago


Remove unused declaration of gws_debug_workaround.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Amber Lin <amber.lin@amd.com>

effe811c

drm/amdgpu: fix inconsistent indenting warning · 5da39174

Charles Han authored 3 weeks ago


Fix below inconsistent indenting smatch warning.
smatch warnings:
drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.c:582 amdgpu_sdma_reset_engine() warn: inconsistent indenting

Signed-off-by: Charles Han <hanchunchao@inspur.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

5da39174

drm/amdgpu: Do not write to GRBM_CNTL if Aldebaran SRIOV · 59d8fcbe

Victor Lu authored 1 month ago


Aldebaran SRIOV VF does not have write permissions to GRBM_CTNL.
This access can be skipped to avoid a dmesg warning.

v2: Use GC IP version check instead of asic check

Signed-off-by: Victor Lu <victorchengchi.lu@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>

59d8fcbe

drm/amdgpu: Fix core reset sequence for JPEG5_0_1 · c2ac1703

Sathishkumar S authored 1 month ago


For cores 1 through 9 repair the core reset sequence by
adjusting offsets to access the expected registers.

Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>

c2ac1703

Mar 04, 2025

drm/amdgpu: validate return value of pm_runtime_get_sync · 4baa0dca

Sunil khatri authored 3 weeks ago


An invalid return value 'r' of the pm_runtime_get_sync
is r < 0, so fix the return value check and add proper
failure log and exit cleanly.

Successful refcount in userqueue creation to make sure
device remains in active state.

Fixes: 33d65834 ("drm/amdgpu/userq: handle runtime pm")
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>

4baa0dca

drm/amdkfd: flag per-sdma queue reset supported to user space · 9c77ffcb

jokim-amd authored 1 month ago


Similar to compute queue reset, flag SDMA queue reset capabilities to
user space for safe testing.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Harish Kasiviswanathan <harish.kasiviswanathan@amd.com>

9c77ffcb

drm/amdkfd: implement per queue sdma reset for gfx 9.4+ · 8bef2cba

jokim-amd authored 2 months ago

To reset hung SDMA queues on GFX 9.4+ for the GFX9 family, a soft reset
must be issued through SMU. Since soft resets will reset an entire SDMA
engine, use a common KGD call to do the reset as the KGD will handle
avoiding a reset of in flight GFX and paging queues on that engine.

In addition, create a common call for all reset types to simplify
the handling of module parameter settings that block gpu resets.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Harish Kasiviswanathan <harish.kasiviswanathan@amd.com>

8bef2cba

drm/amdgpu: Do not program AGP BAR regs under SRIOV in gfxhub_v1_0.c · 2118ad17

Victor Lu authored 1 month ago


SRIOV VF does not have write access to AGP BAR regs.
Skip the writes to avoid a dmesg warning.

Signed-off-by: Victor Lu <victorchengchi.lu@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>

2118ad17

drm/amdgpu: Fix core reset sequence for JPEG4_0_3 · ca115909

Sathishkumar S authored 1 month ago


For cores 1 through 7 repair the core reset sequence by
adjusting offsets to access the expected registers.

Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>

ca115909

drm/amdgpu: Add support for CPERs on virtualization · 3e675ce9

Tony Yi authored 1 month ago


Add support for CPERs on VFs.

VFs do not receive PMFW messages directly; as such, they need to
query them from the host. To avoid hitting host event guard,
CPER queries need to be rate limited. CPER queries share the same
RAS telemetry buffer as error count query, so a mutex protecting
the shared buffer was added as well.

For readability, the amdgpu_detect_virtualization was refactored
into multiple individual functions.

Signed-off-by: Tony Yi <Tony.Yi@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>

3e675ce9

Admin message

Admin message