Commits · e7a477735f1771b9a9346a5fbd09d7ff0641723a · Alex Deucher / linux

Feb 17, 2025

drm/amdkfd: Fix user queue validation on Gfx7/8 · e7a47773

PhilipY authored 1 month ago


To workaround queue full h/w issue on Gfx7/8, when application create
AQL queue, the ring buffer bo allocate size is queue_size/2 and
map queue_size ring buffer to GPU in 2 pieces using 2 attachments, each
attachment map size is queue_size/2, with same ring_bo backing memory.

For Gfx7/8, user queue buffer validation should use queue_size/2 to
verify ring_bo allocation and mapping size.

Fixes: 68e599db ("drm/amdkfd: Validate user queue buffers")
Suggested-by: Tomáš Trnka <trnka@scm.com>
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

e7a47773

drm/amdgpu: Introduce funcs for generating cper record · ad97840f

Hawking Zhang authored 2 months ago


Introduce new functions that are used to generate
cper ue or ce records.

v2: return -ENOMEM instead of false
v2: check return value of fill section function

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Xiang Liu <xiang.liu@amd.com>
Reviewed-by: Yang Wang <keivnyang.wang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

ad97840f

drm/amdgpu: Include ACA error type in aca bank · 56316ee9

Hawking Zhang authored 2 months ago


ACA error types managed by driver a direct 1:1
correspondence with those managed by firmware.

To address this, for each ACA bank, include
both the ACA error type and the ACA SMU type.

This addition is useful for creating CPER records.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Yang Wang <keivnyang.wang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

56316ee9

drm/amdgpu: Optimize the enablement of GECC · 76b1f8b3

Candice Li authored 1 month ago


Enable GECC only when the default memory ECC mode or
the module parameter amdgpu_ras_enable is activated.

v2: Add kernel message to remind users explicitly set
    amdgpu_ras_enable=1 before driver loading to enable GECC
    and set amdgpu_ras_enable=0 to disable GECC when GECC is
    currently enabled if needed.

Signed-off-by: Candice Li <candice.li@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

76b1f8b3

drm/amdgpu: Introduce funcs for populating CPER · 92d5d2a0

Hawking Zhang authored 2 months ago


Introduce utility functions designed to assist
in populating CPER records.

v2: call cper_init/fini in device_ip_init/fini.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

92d5d2a0

drm/amd/include: Add amd cper header · 523b69c6

Hawking Zhang authored 2 months ago


AMD is using Common Platform Error Record (CPER) format
to report all gpu hardware errors.

v2: add program attribute

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Xiang Liu <xiang.liu@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

523b69c6

drm/amdgpu: Rename VCN clock gating function for consistency · 2012aff9

SRINIVASAN SHANMUGAM authored 1 month ago

Change the function name from vcn_v2_5_enable_clock_gating_inst
to vcn_v2_5_enable_clock_gating to ensure consistency in naming.

Fixes the below with gcc W=1:
drivers/gpu/drm/amd/amdgpu/vcn_v2_5.c:781: warning: expecting prototype for vcn_v2_5_enable_clock_gating_inst(). Prototype was for vcn_v2_5_enable_clock_gating() instead

Cc: Leo Liu <leo.liu@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

2012aff9

drm/amdgpu/vcn4.0.3: drop dpm power helpers · eda80f1c

Alex Deucher authored 1 month ago


VCN 4.0.3 doesn't support powergating so there is
no need to call these.

Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

eda80f1c

drm/amdgpu/vcn5.0.1: drop dpm power helpers · 77802398

Alex Deucher authored 1 month ago


VCN 5.0.1 doesn't support powergating so there is
no need to call these.

Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

77802398

drm/amdgpu/vcn5.0.1: use correct dpm helper · 0487f503

Alex Deucher authored 1 month ago


The VCN and UVD helpers were split in
commit ff69bba0 ("drm/amd/pm: add inst to dpm_set_powergating_by_smu")
However, this happened in parallel to the vcn 5.0.1
development so it was missed there.

Fixes: 346492f3 ("drm/amdgpu: Add VCN_5_0_1 support")
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: Sonny Jiang <sonjiang@amd.com>
Cc: Boyuan Zhang <boyuan.zhang@amd.com>

0487f503

drm/amdgpu/umsch: tidy up the ucode name string handling · 56763be4

Alex Deucher authored 1 month ago


Make the constant parts of the name part of the string
we pass to amdgpu_ucode_request().  Only the version
number varies from IP to IP.

Reviewed-by: Saleemkhan Jamadar <saleemkhan.jamadar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: Lang Yu <Lang.Yu@amd.com>

56763be4

drm/amdgpu/umsch: fix ucode check · c917e39c

Alex Deucher authored 1 month ago


Return an error if the IP version doesn't match otherwise
we end up passing a NULL string to amdgpu_ucode_request.
We should never hit this in practice today since we only
enable the umsch code on the supported IP versions, but
add a check to be safe.

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202502130406.iWQ0eBug-lkp@intel.com/


Fixes: 02062042 ("drm/amd: Use a constant format string for amdgpu_ucode_request")
Reviewed-by: Saleemkhan Jamadar <saleemkhan.jamadar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Lang Yu <Lang.Yu@amd.com>

c917e39c

drm/amdgpu: Remove extra checks for CPX · 5183e690

Amber Lin authored 1 month ago


As far as the number of XCCs, the number of compute partitions, and the
number of memory partitions qualify, CPX is valid.

Signed-off-by: Amber Lin <Amber.Lin@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

5183e690

drm/amdgpu/umsch: declare umsch firmware · fe652bec

Alex Deucher authored 1 month ago


Needed to be properly picked up for the initrd, etc.

Fixes: 3488c79b ("drm/amdgpu: add initial support for UMSCH")
Reviewed-by: Saleemkhan Jamadar <saleemkhan.jamadar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: Lang Yu <Lang.Yu@amd.com>

fe652bec

drm/amdgpu/gfx: only call mes for enforce isolation if supported · 80513e38

Alex Deucher authored 1 month ago


This should not be called on chips without MES so check if
MES is enabled and if the cleaner shader is supported.

Fixes: 8521e3c5 ("drm/amd/amdgpu: limit single process inside MES")
Reviewed-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: Shaoyun Liu <shaoyun.liu@amd.com>
Cc: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>

80513e38

drm/amdgpu: Add ring reset callback for JPEG2_0_0 · 500c04d2

Sathishkumar S authored 1 month ago


Add ring reset function callback for JPEG2_0_0 to
recover from job timeouts without a full gpu reset.

Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

500c04d2

drm/amdgpu: Add ring reset callback for JPEG2_5_0 · 09e24a0b

Sathishkumar S authored 1 month ago


Add ring reset function callback for JPEG2_5_0 to
recover from job timeouts without a full gpu reset.

Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

09e24a0b

drm/amdgpu: Per-instance init func for JPEG2_5_0 · cb493aee

Sathishkumar S authored 1 month ago


Add helper functions to handle per-instance initialization
and deinitialization in JPEG2_5_0.

Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

cb493aee

drm/amdgpu: Add ring reset callback for JPEG3_0_0 · 03399d0b

Sathishkumar S authored 1 month ago


Add ring reset function callback for JPEG3_0_0 to
recover from job timeouts without a full gpu reset.

Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

03399d0b

drm/amdgpu: Add ring reset callback for JPEG4_0_0 · 74894ffc

Sathishkumar S authored 1 month ago


Add ring reset function callback for JPEG4_0_0 to
recover from job timeouts without a full gpu reset.

Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

74894ffc

drm/amdgpu: Per-instance init func for JPEG4_0_3 · abce7b4f

Sathishkumar S authored 2 months ago


Add helper functions to handle per-instance and per-core
initialization and deinitialization in JPEG4_0_3.

Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

abce7b4f

drm/amdgpu: refine smu send msg debug log format · 8c663123

Yang Wang authored 1 month ago


remove unnecessary line breaks.

[   51.280860] amdgpu 0000:24:00.0: amdgpu: smu send message: GetEnabledSmuFeaturesHigh(13) param: 0x00000000, resp: 0x00000001,                        readval: 0x00003763

Fixes: 0cd2bc06 ("drm/amd/pm: enable amdgpu smu send message log")
Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Kenneth Feng <kenneth.feng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

8c663123

drm/amdgpu/umsch: remove vpe test from umsch · b0bebbe4

Saleemkhan Jamadar authored 3 months ago


current test is more intrusive for user queue test

Signed-off-by: Saleemkhan Jamadar <saleemkhan.jamadar@amd.com>
Suggested-by: Christian Koenig <christian.koenig@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

b0bebbe4

drm/amdgpu: Enable ACA by default for psp v13_0_12 · 59af05d6

Candice Li authored 1 month ago


Enable ACA by default for psp v13_0_12.

Signed-off-by: Candice Li <candice.li@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

59af05d6

Feb 13, 2025

drm/amdgpu/mes: Add cleaner shader fence address handling in MES for GFX12 · 78454387

Alex Deucher authored 1 month ago


This commit introduces enhancements to the handling of the cleaner
shader fence in the AMDGPU MES driver:

- The MES (Microcode Execution Scheduler) now sends a PM4 packet to the
  KIQ (Kernel Interface Queue) to request the cleaner shader, ensuring
  that requests are handled in a controlled manner and avoiding the
  race conditions.
- The CP (Compute Processor) firmware has been updated to use a private
  bus for accessing specific registers, avoiding unnecessary operations
  that could lead to issues in VF (Virtual Function) mode.
- The cleaner shader fence memory address is now set correctly in the
  `mes_set_hw_res_pkt` structure, allowing for proper synchronization of
  the cleaner shader execution.

Cc: Christian König <christian.koenig@amd.com>
Cc: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Suggested-by: Shaoyun Liu <shaoyun.liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

78454387

drm/amdkfd: Fix pasid value leak · 10e08943

Xiaogang Chen authored 1 month ago

Curret kfd does not allocate pasid values, instead uses pasid value for each
vm from graphic driver. So should not prevent graphic driver from releasing
pasid values since the values are allocated by graphic driver, not kfd driver
anymore. This patch does not stop graphic driver release pasid values.

Fixes: 8544374c ("drm/amdkfd: Have kfd driver use same PASID values from graphic driver")
Signed-off-by: Xiaogang Chen <xiaogang.chen@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

10e08943

drm/amd/include : Update MES v12 API for fence update · 15d8c92f

Shaoyun Liu authored 1 month ago


MES fence_value will be updated in fence_addr if API success,
otherwise upper 32 bit will be used to indicate error code.
In any case, MES will trigger an EOP interrupt with 0xb1 as
context id in the interrupt cookie

Signed-off-by: Shaoyun Liu <shaoyun.liu@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

15d8c92f

drm/amdgpu/vcn: enable TMZ support for vcn 4_0_5 · 4b9a3117

Saleemkhan Jamadar authored 1 month ago


TMZ support is enabled for vcn on GC IP 11_5_0

Signed-off-by: Saleemkhan Jamadar <saleemkhan.jamadar@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

4b9a3117

drm/amd/pm: Rename pmfw message SetPstatePolicy · 87d8232f

Asad Kamal authored 1 month ago


Rename pmfw message SelectPstatePolicy to SetThrottlingPolicy as per
pmfw interface header for smu_v_13_0_6

Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

87d8232f

drm/amdgpu/mes: Add cleaner shader fence address handling in MES for GFX11 · be2560e4

SRINIVASAN SHANMUGAM authored 1 month ago


This commit introduces enhancements to the handling of the cleaner
shader fence in the AMDGPU MES driver:

- The MES (Microcode Execution Scheduler) now sends a PM4 packet to the
  KIQ (Kernel Interface Queue) to request the cleaner shader, ensuring
  that requests are handled in a controlled manner and avoiding the
  race conditions.
- The CP (Compute Processor) firmware has been updated to use a private
  bus for accessing specific registers, avoiding unnecessary operations
  that could lead to issues in VF (Virtual Function) mode.
- The cleaner shader fence memory address is now set correctly in the
  `mes_set_hw_res_pkt` structure, allowing for proper synchronization of
  the cleaner shader execution.

Cc: lin cao <lin.cao@amd.com>
Cc: Jingwen Chen <Jingwen.Chen2@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Suggested-by: Shaoyun Liu <shaoyun.liu@amd.com>
Reviewed by: Shaoyun.liu  <Shaoyun.liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

be2560e4

drm/amd/amdgpu: add support for IP version 11.5.2 · 16a5a8fe

Ying Li authored 1 month ago


This initializes drm/amd/amdgpu version 11.5.2

Signed-off-by: YING LI <yingli12@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

16a5a8fe

drm/amd/pm: add support for IP version 11.5.2 · ee9e6454

Ying Li authored 1 month ago


This initializes drm/amd/pm version 11.5.2

Signed-off-by: YING LI <yingli12@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

ee9e6454

drm/amdgpu: Unlocked unmap only clear page table leaves · 23b64523

PhilipY authored 2 months ago


SVM migration unmap pages from GPU and then update mapping to GPU to
recover page fault. Currently unmap clears the PDE entry for range
length >= huge page and free PTB bo, update mapping to alloc new PT bo.
There is race bug that the freed entry bo maybe still on the pt_free
list, reused when updating mapping and then freed, leave invalid PDE
entry and cause GPU page fault.

By setting the update to clear only one PDE entry or clear PTB, to
avoid unmap to free PTE bo. This fixes the race bug and improve the
unmap and map to GPU performance. Update mapping to huge page will
still free the PTB bo.

With this change, the vm->pt_freed list and work is not needed. Add
WARN_ON(unlocked) in amdgpu_vm_pt_free_dfs to catch if unmap to free the
PTB.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

23b64523

drm/amdgpu/mes11: fix set_hw_resources_1 calculation · 1350dd36

Alex Deucher authored 1 month ago

It's GPU page size not CPU page size.  In most cases they
are the same, but not always.  This can lead to overallocation
on systems with larger pages.

Cc: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

1350dd36

drm/amdkfd: fix missing L2 cache info in topology · 5ffd5682

Eric Huang authored 1 month ago


In some ASICs L2 cache info may miss in kfd topology,
because the first bitmap may be empty, that means
the first cu may be inactive, so to find the first
active cu will solve the issue.

v2: Only find the first active cu in the first xcc

Signed-off-by: Eric Huang <jinhuieric.huang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

5ffd5682

drm/amdgpu/vcn2.5: split code along instances · ebc25499

Alex Deucher authored 4 months ago


Split the code on a per instance basis.  This will allow
us to use the per instance functions in the future to
handle more things per instance.

Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

ebc25499

drm/amd/display: 3.2.320 · 53472eeb

Taimur Hassan authored 1 month ago


Summary:

* Start enabling support for 4-plane MPO
* DML21 Updates
* SPL Updates
* Other minor fixes

Signed-off-by: Taimur Hassan <Syed.Hassan@amd.com>
Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Reviewed-by: Aric Cyr <Aric.Cyr@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

53472eeb

drm/amdgpu: Set snoop bit for SDMA for MI series · 3394b1f7

Harish Kasiviswanathan authored 1 month ago


SDMA writes has to probe invalidate RW lines. Set snoop bit in mmhub for
this to happen.

v2: Missed a few mmhub_v9_4. Added now.
v3: Calculate hub offset once since it doesn't change inside the loop
    Modified function names based on review comments.

Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Reviewed-by: Philip Yang <Philip.Yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

3394b1f7

drm/amd/display: sspl: cleanup filter code · 53b2e0c2

Samson Tam authored 2 months ago


[Why & How]
Remove unused filters and functions
Add static to limit scope

Signed-off-by: Samson Tam <Samson.Tam@amd.com>
Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Reviewed-by: Jun Lei <jun.lei@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

53b2e0c2

drm/amd/display: Make dcn401_program_pipe non static · 8f87447a

Aurabindo Pillai authored 1 month ago


Allow reuse of code by making dcn401_program_pipe()
non static.

Fixes: 2739bd12 ("drm/amd/display: Allow reuse of of DCN4x code")
Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Karthi Kandasamy <karthi.kandasamy@amd.com>
Reviewed-by: Alvin Lee <Alvin.Lee2@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

8f87447a

Admin message

Admin message