Commits · amd-staging-drm-next · Alex Deucher / linux

Mar 21, 2025

drm/amd/pm: Update feature list for smu_v13_0_6 · 4e9bd6ca

Asad Kamal authored 1 week ago


Update feature list for smu_v13_0_6 to show vcn & smu deep
sleep feature enable status

Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>

4e9bd6ca

drm/amdgpu: Add parameter documentation for amdgpu_sync_fence · 33e9323d

SRINIVASAN SHANMUGAM authored 5 days ago


The 'flags' parameter, which specifies memory allocation behavior while
creating a sync entry,

Fixes the below with gcc W=1:
drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c:162: warning: Function parameter or struct member 'flags' not described in 'amdgpu_sync_fence'

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>

33e9323d

drm/amdgpu/discovery: optionally use fw based ip discovery · 2fb9a08e

Alex Deucher authored 2 weeks ago


On chips without native IP discovery support, use the fw binary
if available, otherwise we can continue without it.

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Flora Cui <flora.cui@amd.com>

2fb9a08e

drm/amdgpu/discovery: use specific ip_discovery.bin for legacy asics · 793d0d48

fcui authored 2 weeks ago


vega10/vega12/vega20/raven/raven2/picasso/arcturus/aldebaran

Signed-off-by: Flora Cui <flora.cui@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>

793d0d48

drm/amdgpu/discovery: check ip_discovery fw file available · cd7ca908

fcui authored 3 weeks ago


Signed-off-by: Flora Cui <flora.cui@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>

cd7ca908

drm/amd/pm: Remove unnecessay UQ10 to UINT conversion · bda0d89b

Asad Kamal authored 6 days ago


Few of the metrics data for smu_v13_0_12 has not been reported
in Q10 format, remove UQ10 to UINT conversion for those

Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>

bda0d89b

drm/amd/pm: Remove unnecessay UQ10 to UINT conversion · 41f3893f

Asad Kamal authored 6 days ago


Few of the metrics data for smu_v13_0_6 has not been reported
in Q10 format, remove UQ10 to UINT conversion for those

v2: Move smu_v13_0_12 changes to separate patch(Kevin)

Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>

41f3893f

drm/amdgpu/sdma_v4_4_2: update VM flush implementation for SDMA · 47cad920

Jie1zhang authored 3 weeks ago and

Jie1zhang committed 5 days ago


This commit updates the VM flush implementation for the SDMA engine.

- Added a new function `sdma_v4_4_2_get_invalidate_req` to construct the VM_INVALIDATE_ENG0_REQ
  register value for the specified VMID and flush type. This function ensures that all relevant
  page table cache levels (L1 PTEs, L2 PTEs, and L2 PDEs) are invalidated.

- Modified the `sdma_v4_4_2_ring_emit_vm_flush` function to use the new `sdma_v4_4_2_get_invalidate_req`
  function. The updated function emits the necessary register writes and waits to perform a VM flush
  for the specified VMID. It updates the PTB address registers and issues a VM invalidation request
  using the specified VM invalidation engine.

- Included the necessary header file `gc/gc_9_0_sh_mask.h` to provide access to the required register
  definitions.

v2: vm flush by the vm inalidation packet (Lijo)
v3: code stle and define thh macro for the vm invalidation packet (Christian)
v4: Format definition sdma vm invalidate packet (Lijo)

Suggested-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Jesse Zhang <jesse.zhang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>

47cad920

drm/amdgpu: Optimize VM invalidation engine allocation and synchronize GPU TLB flush · 58e5d1ee

Jie1zhang authored 4 weeks ago and

Jie1zhang committed 5 days ago


- Modify the VM invalidation engine allocation logic to handle SDMA page rings.
  SDMA page rings now share the VM invalidation engine with SDMA gfx rings instead of
  allocating a separate engine. This change ensures efficient resource management and
  avoids the issue of insufficient VM invalidation engines.

- Add synchronization for GPU TLB flush operations in gmc_v9_0.c.
  Use spin_lock and spin_unlock to ensure thread safety and prevent race conditions
  during TLB flush operations. This improves the stability and reliability of the driver,
  especially in multi-threaded environments.

 v2: replace the sdma ring check with a function `amdgpu_sdma_is_page_queue`
 to check if a ring is an SDMA page queue.(Lijo)

 v3: Add GC version check, only enabled on GC9.4.3/9.4.4/9.5.0
 v4: Fix code style and add more detailed description (Christian)
 v5: Remove dependency on vm_inv_eng loop order, explicitly lookup shared inv_eng(Christian/Lijo)
 v6: Added search shared ring function amdgpu_sdma_get_shared_ring (Lijo)

Suggested-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Jesse Zhang <jesse.zhang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>

58e5d1ee

drm/amd/amdgpu: Increase max rings to enable SDMA page ring · 6846a69e

Jie1zhang authored 3 weeks ago


Increase the maximum number of rings supported by the AMDGPU driver from 133 to 149.
This change is necessary to enable support for the SDMA page ring.

Signed-off-by: Jesse Zhang <jesse.zhang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>

6846a69e

drm/amdgpu: Decode deferred error type in gfx aca bank parser · 0984dea8

Xiang Liu authored 6 days ago


In the case of injecting uncorrected error with background workload,
the deferred error among uncorrected errors need to be specified
by checking the deferred and poison bits of status register.

v2: refine checking for deferred error
v2: log possiable DEs among CEs
v2: generate CPER records for DEs among UEs

Signed-off-by: Xiang Liu <xiang.liu@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>

0984dea8

Mar 20, 2025

drm/amdgpu/gfx11: Add Cleaner Shader Support for GFX11.5 GPUs · 5f6c5500

SRINIVASAN SHANMUGAM authored 5 days ago


Enable the cleaner shader for GFX11.5.0/11.5.1 GPUs to provide data
isolation between GPU workloads. The cleaner shader is responsible for
clearing the Local Data Store (LDS), Vector General Purpose Registers
(VGPRs), and Scalar General Purpose Registers (SGPRs), which helps
prevent data leakage and ensures accurate computation results.

This update extends cleaner shader support to GFX11.5.0/11.5.1 GPUs,
previously available for GFX11.0.3. It enhances security by clearing GPU
memory between processes and maintains a consistent GPU state across KGD
and KFD workloads.

Cc: Mario Sopena-Novales <mario.novales@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>

5f6c5500

drm/amdgpu/mes: warn on unexpected pipe numbers · ad6b59a3

Alex Deucher authored 5 days ago


Warn if the number of pipes exceeds what the MES supports.

Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

ad6b59a3

drm/amdgpu/mes: clean up SDMA HQD loop · 60d68ca4

Alex Deucher authored 6 days ago


Follow the same logic as the other IP types.

Reviewed-by: Prike Liang <Prike.Liang@amd.com>
Acked-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

60d68ca4

drm/amdgpu/mes: enable compute pipes across all MEC · 5d7ca58a

Alex Deucher authored 6 days ago


Enable pipes on both MECs for MES.

Fixes: 745f46b6 ("drm/amdgpu: enable mes v12 self test")
Acked-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Prike Liang <Prike.Liang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

5d7ca58a

drm/amdgpu/mes: drop MES 10.x leftovers · c32203ad

Alex Deucher authored 6 days ago


Leftover from MES bring up.  There is no production
MES support for MES 10.x.  The rest of the MES 10.x
code has already been removed so drop this.

Acked-by: Prike Liang <Prike.Liang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

c32203ad

drm/amdgpu/mes: optimize compute loop handling · 263db67a

Alex Deucher authored 6 days ago


Break when we get to the end of the supported pipes
rather than continuing the loop.

Reviewed-by: Shaoyun.liu <Shaoyun.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

263db67a

drm/amdgpu/mes: centralize gfx_hqd mask management · cc65aade

Alex Deucher authored 3 weeks ago


Move it to amdgpu_mes to align with the compute and
sdma hqd masks. No functional change.

v2: rebase on new changes
v3: misc optimizations

Reviewed-by: Prike Liang <Prike.Liang@amd.com>
Reviewed-by: Sunil <Khatri&lt;sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

cc65aade

drm/amdgpu/sdma: guilty tracking is per instance · 297af6ca

Alex Deucher authored 1 week ago


The gfx and page queues are per instance, so track them
per instance.

v2: drop extra parameter (Lijo)

Fixes: fdbfaaaa ("drm/amdgpu: Improve SDMA reset logic with guilty queue tracking")
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

297af6ca

drm/amdgpu/sdma: fix engine reset handling · a5f79b4a

Alex Deucher authored 1 week ago


Move the kfd suspend/resume code into the caller.  That
is where the KFD is likely to detect a reset so on the KFD
side there is no need to call them.  Also add a mutex to
lock the actual reset sequence.

v2: make the locking per instance

Fixes: bac38ca8 ("drm/amdkfd: implement per queue sdma reset for gfx 9.4+")
Reviewed-by: Jesse Zhang <jesse.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

a5f79b4a

drm/amdgpu: remove invalid usage of sched.ready · 46695798

Christian König authored 1 week ago


I can't count how often I had to remove this nonsense.

Probably doesn't need an explanation any more.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>

46695798

drm/amdgpu: add cleaner shader trace point · 5c05f70f

Christian König authored 1 month ago


Note when the cleaner shader is executed.

Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>

5c05f70f

drm/amdgpu: add isolation trace point · 7e2ae078

Christian König authored 1 month ago


Note when we switch from one isolation owner to another.

Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>

7e2ae078

drm/amdgpu: stop reserving VMIDs to enforce isolation · bce34a64

Christian König authored 1 month ago


That was quite troublesome for gang submit. Completely drop this
approach and enforce the isolation separately.

Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>

bce34a64

drm/amdgpu: rework how the cleaner shader is emitted v3 · dde0eee9

Christian König authored 1 month ago


Instead of emitting the cleaner shader for every job which has the
enforce_isolation flag set only emit it for the first submission from
every client.

v2: add missing NULL check
v3: fix another NULL pointer deref

Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>

dde0eee9

drm/amdgpu: rework how isolation is enforced v2 · 37fb7eba

Christian König authored 2 months ago


Limiting the number of available VMIDs to enforce isolation causes some
issues with gang submit and applying certain HW workarounds which
require multiple VMIDs to work correctly.

So instead start to track all submissions to the relevant engines in a
per partition data structure and use the dma_fences of the submissions
to enforce isolation similar to what a VMID limit does.

v2: use ~0l for jobs without isolation to distinct it from kernel
    submissions which uses NULL for the owner. Add some warning when we
    are OOM.

Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>

37fb7eba

drm/amdgpu: overwrite signaled fence in amdgpu_sync · a62888d3

Christian König authored 2 months ago


This allows using amdgpu_sync even without peeking into the fences for a
long time.

Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>

a62888d3

Mar 19, 2025

drm/amdgpu: use GFP_NOWAIT for memory allocations · 7d3a3a22

Christian König authored 2 months ago

In the critical submission path memory allocations can't wait for
reclaim since that can potentially wait for submissions to finish.

Finally clean that up and mark most memory allocations in the critical
path with GFP_NOWAIT. The only exception left is the dma_fence_array()
used when no VMID is available, but that will be cleaned up later on.

Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>

7d3a3a22

drm/amd/amdgpu: Revert "drm/amd/amdgpu: shorten the gfx idle worker timeout" · 1b0c80cd

Kenneth Feng authored 1 week ago


This reverts commit b00fb976.

Reason for revert: this causes some tests fail with call trace.

Signed-off-by: Kenneth Feng <kenneth.feng@amd.com>
Acked-by: Yang Wang <kevinyang.wang@amd.com>

1b0c80cd

drm/amdgpu: remove is_mes_queue flag · 72187d6b

Alex Deucher authored 1 week ago


This was leftover from MES bring up when we had MES
user queues in the kernel.  It's no longer used so
remove it.

Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

72187d6b

drm/amdgpu/sdam: Skip SDMA queue reset for SRIOV · 1819468d

Ahmad Rehman authored 1 week ago


For SRIOV, skip the SDMA queue reset and return
error. The engine/queue reset failure will trigger
FLR in the sequence.

v2: do not add queue reset support mask for sriov

Signed-off-by: Ahmad Rehman <Ahmad.Rehman@amd.com>

Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>

1819468d

drm/amdgpu: Add support to load PSP TA v13.0.12 for SRIOV · 2374c146
Ahmad Rehman authored 1 week ago
```
Signed-off-by: Ahmad Rehman <ahrehman@amd.com>

Reviewed-By:  <Vignesh.Chander@amd.com>
```
2374c146

drm/amdgpu: Enable amdgpu_ras_resume for gfx 9.5.0 · a0f80b14

Ellen Pan authored 1 week ago


This enables ras to be resumed after gpu recovery on mi350 sriov.

Signed-off-by: Ellen Pan <yunru.pan@amd.com>
Reviewed-by: Ahmad Rehman <Ahmad.Rehman@amd.com>

a0f80b14

drm/amdkfd: set precise mem ops caps to disabled for gfx 11 and 12 · dd7f78ef

jokim-amd authored 1 week ago


Clause instructions with precise memory enabled currently hang the
shader so set capabilities flag to disabled since it's unsafe to use
for debugging.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Tested-by: Lancelot Six <lancelot.six@amd.com>
Reviewed-by: Harish Kasiviswanathan <harish.kasiviswanathan@amd.com>

dd7f78ef

drm/amdgpu: Skip pcie_replay_count sysfs creation for VF · 7d33763e

Victor Skvortsov authored 1 week ago


VFs cannot read the NAK_COUNTER register. This information is only
available through PMFW metrics.

Signed-off-by: Victor Skvortsov <victor.skvortsov@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>

7d33763e

drm/amdgpu: Add active_umc_mask to ras init_flags · 3316dadf

Candice Li authored 2 weeks ago


Add active_umc_mask to ras init_flags.

Signed-off-by: Candice Li <candice.li@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>

3316dadf

Mar 18, 2025

Documentation/amdgpu: Add debug_mask documentation · 6db92d3b

lijo lazar authored 2 weeks ago


Add description for debug_mask bit options.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>

6db92d3b

drm/amd/pm: Add debug bit for smu pool allocation · 596f71da

lijo lazar authored 2 weeks ago


In certain cases, it's desirable to avoid PMFW log transactions to
system memory. Add a mask bit to decide whether to allocate smu pool in
device memory or system memory.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>

596f71da

drm/amdgpu/vcn: adjust workload profile handling · 32220368

Alex Deucher authored 1 week ago


No need to make the workload profile setup dependent
on the results of cancelling the delayed work thread.
We have all of the necessary checking in place for the
workload profile reference counting, so separate the
two.  As it is now, we can theoretically end up with
the call from begin_use happening while the worker
thread is executing which would result in the profile
not getting set for that submission.  It should not
affect the reference counting.

v2: bail early if the the profile is already active (Lijo)

Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

32220368

drm/amdgpu/gfx: adjust workload profile handling · d4f7f6ff

Alex Deucher authored 1 week ago


No need to make the workload profile setup dependent
on the results of cancelling the delayed work thread.
We have all of the necessary checking in place for the
workload profile reference counting, so separate the
two.  As it is now, we can theoretically end up with
the call from begin_use happening while the worker
thread is executing which would result in the profile
not getting set for that submission.  It should not
affect the reference counting.

v2: bail early if the the profile is already active (Lijo)

Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

d4f7f6ff

Admin message

Admin message