Commit 2211660c authored 6 months ago by lijo lazar Committed by Linux Infrastructure 5 months ago

drm/amdgpu: Prefer RAS recovery for scheduler hang


Before scheduling a recovery due to scheduler/job hang, check if a RAS
error is detected. If so, choose RAS recovery to handle the situation. A
scheduler/job hang could be the side effect of a RAS error. In such
cases, it is required to go through the RAS error recovery process. A
RAS error recovery process in certains cases also could avoid a full
device device reset.

An error state is maintained in RAS context to detect the block
affected. Fatal Error state uses unused block id. Set the block id when
error is detected. If the interrupt handler detected a poison error,
it's not required to look for a fatal error. Skip fatal error checking
in such cases.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>

parent 11fb5ae3

No related branches found

No related tags found

Hide whitespace changes

Inline Side-by-side

Showing with 78 additions and 7 deletions

lijo lazar @lijo
mentioned in commit 772319e1
· 4 months ago

mentioned in commit 772319e1

mentioned in commit 772319e195ff69c8f30a721d55f05c43f88270d1

Toggle commit list
lijo lazar @lijo
mentioned in commit 90185f3a
· 4 months ago

mentioned in commit 90185f3a

mentioned in commit 90185f3a561b59a3f15d62d7c662db5e3a0e64db

Toggle commit list
lijo lazar @lijo
mentioned in commit 44361425
· 4 months ago

mentioned in commit 44361425

mentioned in commit 44361425d7c5503c9fc3f07a1f8a5f0d4591a2d7

Toggle commit list

Please register or to comment

Admin message

Admin message

drm/amdgpu: Prefer RAS recovery for scheduler hang