[Regression] amdgpu: update from 5.10.145 to 5.10.146...149 breaks boot on Ryzen based computers
The following is an extended repost of https://bugzilla.kernel.org/show_bug.cgi?id=216608
Brief summary of the problem:
The issue looks like a hang at random point: some times with black screen, some time showing kernel boot log in text or graphical mode. Reproducible 100% - every boot.
Hardware description:
Affected platforms:
- AMD Ryzen 5 3500U with Radeon Vega Mobile Gfx
- AMD Ryzen 5 4600H with Radeon Graphics
- AMD Ryzen 5 2400G with Radeon Vega Graphics
- AMD Ryzen 3 5300U with Radeon Graphics
- AMD Ryzen 5 5600U with Radeon Graphics
- AMD Ryzen 3 2200G with Radeon Vega Graphics
- AMD Ryzen 3 5300U with Radeon Graphics
- AMD Ryzen 5 3500U with Radeon Vega Mobile Gfx
Not affected:
- AMD A8-7410 APU with AMD Radeon R5 Graphics
- AMD A12-9720P RADEON R7, 12 COMPUTE CORES 4C+8G
System used for logs capturing:
# lscpu |grep Model
Model: 17
Model name: AMD Ryzen 5 2400G with Radeon Vega Graphics
# lspci -nn |grep VGA
01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GK208B [GeForce GT 710] [10de:128b] (rev a1)
08:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Raven Ridge [Radeon Vega Series / Radeon Vega Mobile Series] [1002:15dd] (rev c6)
System information:
Initially revealed on OS ALT Workstation K 10 (Linux distro) with 5.10.146-alt1, but then confirmed on Debian bullseye 5.10.0-19-amd64 ( 5.10.149-1)
How to reproduce the issue:
On debian the issue is not reproducible until non-free firmware-amd-graphics package is installed (without AMD firmware the issue is not reproducible).
For OS ALT the issue is reproducible on 5.10.146, .147, .148, .149
Fix proposal:
Revert of below patch series fixes the issue:
fda04a0bab7f drm/amd/amdgpu: fixing read wrong pf2vf data in SRIOV
7b0db849ea03 drm/amdgpu: make sure to init common IP before gmc
9d18013dac86 drm/amdgpu: Separate vf2pf work item init from virt data exchange
87a4e51fb8d6 drm/amdgpu: indirect register access for nv12 sriov
9f55f36f749a drm/amdgpu: move nbio sdma_doorbell_range() into sdma code for vega