- Mar 18, 2025
-
-
Tao Zhou authored
Clear old data and save it in V3 format. v2: only format eeprom data for new ASICs. Signed-off-by:
Tao Zhou <tao.zhou1@amd.com> Reviewed-by:
Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
- Mar 14, 2025
-
-
ganglxie authored
for old asics that do not support mca translating, we just save PA for them Signed-off-by:
ganglxie <ganglxie@amd.com> Reviewed-by:
Tao Zhou <tao.zhou1@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
- Mar 07, 2025
-
-
Tao Zhou authored
For default policy, driver will issue an RMA event when the number of bad pages is greater than 8 physical rows, rather than reaches 8 physical rows, don't rely on threshold configurable parameters in default mode. Signed-off-by:
Tao Zhou <tao.zhou1@amd.com> Reviewed-by:
Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
- Feb 25, 2025
-
-
ganglxie authored
save only one record to save eeprom space,and bad_page_num = pa_rec_num + mca_rec_num*16 Signed-off-by:
ganglxie <ganglxie@amd.com> Reviewed-by:
Tao Zhou <tao.zhou1@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
ganglxie authored
nps info saved together with bad page makes bad page parsing more efficient Signed-off-by:
ganglxie <ganglxie@amd.com> Reviewed-by:
Tao Zhou <tao.zhou1@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
- Feb 13, 2025
-
-
lijo lazar authored
Certain SOCs may not need much data from VBIOS. Some data like VBIOS version used will be missed but it doesn't affect functionality. Add a flag to make VBIOS image optional. Signed-off-by:
Lijo Lazar <lijo.lazar@amd.com> Reviewed-by:
Alex Deucher <alexander.deucher@amd.com> Reviewed-by:
Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
Hawking Zhang authored
The driver's behavior varies based on the configuration of amdgpu_bad_page_threshold setting Signed-off-by:
Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by:
Tao Zhou <tao.zhou1@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
- Dec 18, 2024
-
-
Dheeraj Reddy Jonnalagadda authored
Remove the logically dead code in the last return statement of amdgpu_ras_eeprom_init. The condition res < 0 is redundant since res is already checked for a negative value earlier. Replace return res < 0 ? res : 0; with return 0 to improve clarity. Fixes: 63d4c081 ("drm/amdgpu: Optimize EEPROM RAS table I/O") Closes: https://scan7.scan.coverity.com/#/project-view/52337/11354?selectedIssue=1602413 Signed-off-by:
Dheeraj Reddy Jonnalagadda <dheeraj.linuxdev@gmail.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
- Dec 10, 2024
-
-
Tao Zhou authored
After the introduction of NPS RAS, one bad page record on eeprom may be related to 1 or 16 bad pages, so the bad page record and bad page are two different concepts, define a new variable to store bad page number. Signed-off-by:
Tao Zhou <tao.zhou1@amd.com> Reviewed-by:
Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
Tao Zhou authored
Init function is for ras table header read and check function is responsible for the validation of the header. Call them in different stages. Signed-off-by:
Tao Zhou <tao.zhou1@amd.com> Reviewed-by:
Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
Jinzhou Su authored
Return eeprom table checksum error result, otherwise it might be overwritten by next call. V2: replace DRM_ERROR with dev_err Signed-off-by:
Jinzhou Su <jinzhou.su@amd.com> Reviewed-by:
Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
Tao Zhou authored
v1 (legacy way): store channel index within a UMC instance in eeprom v2: store global channel index in eeprom V2: only save the flag on eeprom, clear it after saving. Signed-off-by:
Tao Zhou <tao.zhou1@amd.com> Reviewed-by:
Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
- Sep 18, 2024
-
-
Andrew Kreimer authored
Fix a typo in comments. Reported-by:
Matthew Wilcox <willy@infradead.org> Signed-off-by:
Andrew Kreimer <algonell@gmail.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
- Jul 24, 2024
-
-
stanley yang authored
The eeprom table is empty before initializing, set eeprom table version first before initializing. Changed from V1: Reuse amdgpu_ras_set_eeprom_table_version function Signed-off-by:
Stanley.Yang <Stanley.Yang@amd.com> Reviewed-by:
Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 015b8a2f)
-
- Jul 23, 2024
-
-
stanley yang authored
The eeprom table is empty before initializing, set eeprom table version first before initializing. Changed from V1: Reuse amdgpu_ras_set_eeprom_table_version function Signed-off-by:
Stanley.Yang <Stanley.Yang@amd.com> Reviewed-by:
Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
- Jun 05, 2024
-
-
Tao Zhou authored
Set the flag to true if bad page number reaches threshold. Signed-off-by:
Tao Zhou <tao.zhou1@amd.com> Reviewed-by:
Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
- May 02, 2024
-
-
Hawking Zhang authored
Add smu v13_0_14 ip block support Signed-off-by:
Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by:
Le Ma <Le.Ma@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
- Mar 20, 2024
-
-
Candice Li authored
Use helper function instead of umc callback to set EEPROM table version. Signed-off-by:
Candice Li <candice.li@amd.com> Reviewed-by:
Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
- Feb 12, 2024
-
-
Yang Wang authored
send smu rma reason event to smu in ras eeprom driver. Signed-off-by:
Yang Wang <kevinyang.wang@amd.com> Reviewed-by:
Tao Zhou <tao.zhou1@amd.com> Reviewed-by:
Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
- Nov 29, 2023
-
-
Candice Li authored
Check smu v13_0_0 SKU type to select EEPROM I2C address. Signed-off-by:
Candice Li <candice.li@amd.com> Reviewed-by:
Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.1.x
-
Candice Li authored
Check smu v13_0_0 SKU type to select EEPROM I2C address. Signed-off-by:
Candice Li <candice.li@amd.com> Reviewed-by:
Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
- Oct 05, 2023
-
-
SRINIVASAN SHANMUGAM authored
Fixes the below: ERROR: Macros with complex values should be enclosed in parentheses WARNING: macros should not use a trailing semicolon +#define amdgpu_inc_vram_lost(adev) atomic_inc(&((adev)->vram_lost_counter)); Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: "Pan, Xinhui" <Xinhui.Pan@amd.com> Signed-off-by:
Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by:
Christian König <christian.koenig@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
- Sep 26, 2023
-
-
Tao Zhou authored
The amdgpu_ras_eeprom_control.bad_channel_bitmap is u32 type, but the channel index could be larger than 32. For the ASICs whose channel number is more than 32, the amdgpu_dpm_send_hbm_bad_channel_flag interface is not supported, so we simply bypass channel bitmap update under this condition. v2: replace sizeof with BITS_PER_TYPE, we should check bit number instead of byte number. Signed-off-by:
Tao Zhou <tao.zhou1@amd.com> Reviewed-by:
Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
- Sep 20, 2023
-
-
lijo lazar authored
Use an inline function for version check. Gives more flexibility to handle any format changes. Signed-off-by:
Lijo Lazar <lijo.lazar@amd.com> Reviewed-by:
Alex Deucher <alexander.deucher@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
- Aug 31, 2023
-
-
Candice Li authored
RAS EEPROM device is only supported on dGPU platform for smu v13_0_6. Signed-off-by:
Candice Li <candice.li@amd.com> Reviewed-by:
Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
- Aug 30, 2023
-
-
Candice Li authored
RAS EEPROM device is only supported on dGPU platform for smu v13_0_6. Signed-off-by:
Candice Li <candice.li@amd.com> Reviewed-by:
Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
- Aug 15, 2023
-
-
Candice Li authored
Support I2C EEPROM on smu v13_0_6. v2: Move IP_VERSION(13, 0, 6) ahead of IP_VERSION(13, 0, 10). Signed-off-by:
Candice Li <candice.li@amd.com> Reviewed-by:
Yang Wang <kevinyang.wang@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
- Jul 21, 2023
-
-
Mario Limonciello authored
The VBIOS part number is read both in amdgpu_atom_parse() as well as in atom_get_vbios_pn() and stored twice in the `struct atom_context` structure. Remove the first unnecessary read and move the `pr_info` line from that read into the second. v2: squash in unused variable removal Signed-off-by:
Mario Limonciello <mario.limonciello@amd.com> Reviewed-by:
Alex Deucher <alexander.deucher@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
- Jun 15, 2023
-
-
SRINIVASAN SHANMUGAM authored
Fixes the following gcc with W=1: drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c:76: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst * EEPROM Table structure v1 drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c:98: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst * EEPROM Table structrue v2.1 Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by:
Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Acked-by:
Alex Deucher <alexander.deucher@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
- Jun 09, 2023
-
-
stanley yang authored
Set EEPROM ras info: rma status, health percent and bad page threshold. Signed-off-by:
Stanley.Yang <Stanley.Yang@amd.com> Reviewed-by:
Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
stanley yang authored
It's more reasonable to check EEPROM table ras info bytes. Signed-off-by:
Stanley.Yang <Stanley.Yang@amd.com> Reviewed-by:
Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
stanley yang authored
Add ras info to EEPROM table, app can analyse device ECC status without GPU driver through EEPROM table ras info. Signed-off-by:
Stanley.Yang <Stanley.Yang@amd.com> Reviewed-by:
Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
stanley yang authored
Add RAS EEPROM table version 2.1 macro definition. Signed-off-by:
Stanley.Yang <Stanley.Yang@amd.com> Reviewed-by:
Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
stanley yang authored
Rename RAS_TABLE_VER to RAS_TABLE_VER_V1, move RAS_TABLE_VER_V1 from amdgpu_ras_eeprom.c to amdgpu_ras_eeprom.h. Signed-off-by:
Stanley.Yang <Stanley.Yang@amd.com> Reviewed-by:
Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
- Apr 13, 2023
-
-
Alex Deucher authored
All chips that support RAS also support IP discovery, so use the IP versions rather than a mix of IP versions and asic types. Checking the validity of the atom_ctx pointer is not required as the vbios is already fetched at this point. v2: add comments to id asic types based on feedback from Luben Reviewed-by:
Luben Tuikov <luben.tuikov@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com> Cc: Luben Tuikov <luben.tuikov@amd.com>
-
- Mar 31, 2023
-
-
Luben Tuikov authored
As soon as control->i2c_address is set, return; remove the "break;" from the switch--it is unnecessary. This mimics what happens when for some cases in the switch, we call helper functions with "return <helper function>". Remove final function "return true;" to indicate that the switch is final and terminal, and that there should be no code after the switch. Cc: Candice Li <candice.li@amd.com> Cc: Kent Russell <kent.russell@amd.com> Cc: Alex Deucher <Alexander.Deucher@amd.com> Signed-off-by:
Luben Tuikov <luben.tuikov@amd.com> Reviewed-by:
Alex Deucher <Alexander.Deucher@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
Luben Tuikov authored
Remove second switch since it already has its own function and case in the first switch. This also avoids requalifying the EEPROM I2C address for VEGA20, SIENNA CICHLID, and ALDEBARAN, as those have been set by the first switch and shouldn't match SMU v13.0.x. Cc: Candice Li <candice.li@amd.com> Cc: Kent Russell <kent.russell@amd.com> Cc: Alex Deucher <Alexander.Deucher@amd.com> Fixes: 15822529 ("drm/amdgpu: Add EEPROM I2C address for smu v13_0_0") Fixes: c9bdc6c3 ("drm/amdgpu: Add EEPROM I2C address support for ip discovery") Signed-off-by:
Luben Tuikov <luben.tuikov@amd.com> Reviewed-by:
Alex Deucher <Alexander.Deucher@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
- Feb 23, 2023
-
-
Tao Zhou authored
bad_page_threshold controls page retirement behavior and it should be also checked. v2: simplify the condition of bad page handling path. Signed-off-by:
Tao Zhou <tao.zhou1@amd.com> Reviewed-by:
Stanley.Yang <Stanley.Yang@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
Tao Zhou authored
Ignore ras umc bad page threshold by default, GPU initialization won't be stopped in this mode. v2: refine the description of bad_page_threshold. Signed-off-by:
Tao Zhou <tao.zhou1@amd.com> Reviewed-by:
Stanley.Yang <Stanley.Yang@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
- Nov 17, 2022
-
-
Luben Tuikov authored
Add support for RAS table at I2C EEPROM address of 0x40000, since on some ASICs it is not at 0, but at 0x40000. Cc: Alex Deucher <Alexander.Deucher@amd.com> Cc: Kent Russell <kent.russell@amd.com> Signed-off-by:
Luben Tuikov <luben.tuikov@amd.com> Tested-by:
Kent Russell <kent.russell@amd.com> Reviewed-by:
Kent Russell <kent.russell@amd.com> Reviewed-by:
Alex Deucher <Alexander.Deucher@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-