Skip to content
Snippets Groups Projects
  • lijo lazar's avatar
    186d1f4d
    drm/amdgpu: Prefer RAS recovery for scheduler hang · 186d1f4d
    lijo lazar authored
    
    Before scheduling a recovery due to scheduler/job hang, check if a RAS
    error is detected. If so, choose RAS recovery to handle the situation. A
    scheduler/job hang could be the side effect of a RAS error. In such
    cases, it is required to go through the RAS error recovery process. A
    RAS error recovery process in certains cases also could avoid a full
    device device reset.
    
    An error state is maintained in RAS context to detect the block
    affected. Fatal Error state uses unused block id. Set the block id when
    error is detected. If the interrupt handler detected a poison error,
    it's not required to look for a fatal error. Skip fatal error checking
    in such cases.
    
    Signed-off-by: default avatarLijo Lazar <lijo.lazar@amd.com>
    Reviewed-by: default avatarHawking Zhang <Hawking.Zhang@amd.com>
    186d1f4d
    History
    drm/amdgpu: Prefer RAS recovery for scheduler hang
    lijo lazar authored
    
    Before scheduling a recovery due to scheduler/job hang, check if a RAS
    error is detected. If so, choose RAS recovery to handle the situation. A
    scheduler/job hang could be the side effect of a RAS error. In such
    cases, it is required to go through the RAS error recovery process. A
    RAS error recovery process in certains cases also could avoid a full
    device device reset.
    
    An error state is maintained in RAS context to detect the block
    affected. Fatal Error state uses unused block id. Set the block id when
    error is detected. If the interrupt handler detected a poison error,
    it's not required to look for a fatal error. Skip fatal error checking
    in such cases.
    
    Signed-off-by: default avatarLijo Lazar <lijo.lazar@amd.com>
    Reviewed-by: default avatarHawking Zhang <Hawking.Zhang@amd.com>