Due to an influx of spam, we have had to impose restrictions on new accounts. Please see this wiki page for instructions on how to get full permissions. Sorry for the inconvenience.
Admin message
The migration is almost done, at least the rest should happen in the background. There are still a few technical difference between the old cluster and the new ones, and they are summarized in this issue. Please pay attention to the TL:DR at the end of the comment.
[ 0.000000] Command line: BOOT_IMAGE=/vmlinuz-linux-zen root=UUID=889bfbb5-f502-4691-bb1c-88fc53c2745a rw loglevel=3 quiet amdgpu.runpm=0[ 0.126540] Kernel command line: BOOT_IMAGE=/vmlinuz-linux-zen root=UUID=889bfbb5-f502-4691-bb1c-88fc53c2745a rw loglevel=3 quiet amdgpu.runpm=0[ 3.999514] [drm] amdgpu kernel modesetting enabled.[ 3.999609] amdgpu: Ignoring ACPI CRAT on non-APU system[ 3.999615] amdgpu: Virtual CRAT table created for CPU[ 3.999636] amdgpu: Topology: Add CPU node[ 3.999753] fb0: switching to amdgpu from EFI VGA[ 3.999850] amdgpu 0000:0a:00.0: vgaarb: deactivate vga console[ 3.999972] amdgpu 0000:0a:00.0: amdgpu: Trusted Memory Zone (TMZ) feature disabled as experimental (default)[ 4.001162] amdgpu 0000:0a:00.0: No more image in the PCI ROM[ 4.001190] amdgpu 0000:0a:00.0: amdgpu: Fetched VBIOS from ROM BAR[ 4.001192] amdgpu: ATOM BIOS: xxx-xxx-xxx[ 4.001260] amdgpu 0000:0a:00.0: amdgpu: VRAM: 8176M 0x0000008000000000 - 0x00000081FEFFFFFF (8176M used)[ 4.001264] amdgpu 0000:0a:00.0: amdgpu: GART: 512M 0x0000000000000000 - 0x000000001FFFFFFF[ 4.001266] amdgpu 0000:0a:00.0: amdgpu: AGP: 267894784M 0x0000008400000000 - 0x0000FFFFFFFFFFFF[ 4.001325] [drm] amdgpu: 8176M of VRAM memory ready[ 4.001326] [drm] amdgpu: 8176M of GTT memory ready.[ 4.019391] amdgpu 0000:0a:00.0: amdgpu: PSP runtime database doesn't exist[ 4.075896] amdgpu 0000:0a:00.0: amdgpu: Will use PSP to load VCN firmware[ 4.290783] amdgpu 0000:0a:00.0: amdgpu: RAS: optional ras ta ucode is not available[ 4.295682] amdgpu 0000:0a:00.0: amdgpu: RAP: optional rap ta ucode is not available[ 4.295693] amdgpu 0000:0a:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available[ 4.295815] amdgpu 0000:0a:00.0: amdgpu: use vbios provided pptable[ 4.295819] amdgpu 0000:0a:00.0: amdgpu: smc_dpm_info table revision(format.content): 4.5[ 4.332060] amdgpu 0000:0a:00.0: amdgpu: SMU is initialized successfully![ 4.367934] snd_hda_intel 0000:0a:00.1: bound 0000:0a:00.0 (ops amdgpu_dm_audio_component_bind_ops [amdgpu])[ 4.399384] kfd kfd: amdgpu: Allocated 3969056 bytes on gart[ 4.446542] amdgpu: HMM registered 8176MB device memory[ 4.446603] amdgpu: SRAT table not found[ 4.446604] amdgpu: Virtual CRAT table created for GPU[ 4.446866] amdgpu: Topology: Add dGPU node [0x731f:0x1002][ 4.446870] kfd kfd: amdgpu: added device 1002:731f[ 4.446888] amdgpu 0000:0a:00.0: amdgpu: SE 2, SH per SE 2, CU per SH 10, active_cu_number 40[ 4.448612] fbcon: amdgpudrmfb (fb0) is primary device[ 4.616708] amdgpu 0000:0a:00.0: [drm] fb0: amdgpudrmfb frame buffer device[ 4.623550] amdgpu 0000:0a:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0[ 4.623555] amdgpu 0000:0a:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0[ 4.623558] amdgpu 0000:0a:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0[ 4.623561] amdgpu 0000:0a:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 5 on hub 0[ 4.623563] amdgpu 0000:0a:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 6 on hub 0[ 4.623565] amdgpu 0000:0a:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 7 on hub 0[ 4.623567] amdgpu 0000:0a:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 8 on hub 0[ 4.623569] amdgpu 0000:0a:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 9 on hub 0[ 4.623571] amdgpu 0000:0a:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 10 on hub 0[ 4.623572] amdgpu 0000:0a:00.0: amdgpu: ring kiq_2.1.0 uses VM inv eng 11 on hub 0[ 4.623575] amdgpu 0000:0a:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0[ 4.623577] amdgpu 0000:0a:00.0: amdgpu: ring sdma1 uses VM inv eng 13 on hub 0[ 4.623579] amdgpu 0000:0a:00.0: amdgpu: ring vcn_dec uses VM inv eng 0 on hub 1[ 4.623580] amdgpu 0000:0a:00.0: amdgpu: ring vcn_enc0 uses VM inv eng 1 on hub 1[ 4.623582] amdgpu 0000:0a:00.0: amdgpu: ring vcn_enc1 uses VM inv eng 4 on hub 1[ 4.623584] amdgpu 0000:0a:00.0: amdgpu: ring jpeg_dec uses VM inv eng 5 on hub 1[ 4.625154] [drm] Initialized amdgpu 3.44.0 20150101 for 0000:0a:00.0 on minor 0
I believe I'm seeing the same issue since updating to 5.16.2 (5.16.3 behaves the same), but the problem might've started earlier since I skipped a few versions,
In my case the issue resembles #1237 (closed) because disconnecting one of the displays causes the boot to success and I can safely reconnect the screen once I'm in SDDM.
Hi @fardragon and @reactormonk ,
I tried to reproduce this issue by using the latest code from amd-staging-drm-next, but I could not reproduce this issue. Could you help me with these:
Which display resolution are you using? Is this happen independently of the display resolution?
Is it possible for you to try the latest code from amd-staging-drm-next? If so, could you make this tiny change in your kernel:
I believe it's the same patch as here #1886 (comment 1248613) if so I've already tested it and it fixed the issue for me back then. Never mind I see that you're asking me to test amd-staging-drm-next with the patch reverted, I'll try to do it.
I believe the issue must be somehow tied to the display resolution or even refresh rate (I run my 2 monitors at 1440p@144Hz) because booting with these kernel parameters video=DP-1:1920x1080@60 video=DP-2:1920x1080@60 also makes the system boot successfully. Disconnecting one of the screens before booting and reconnecting it after is also a workaround for that matter.
Could you also test with different display resolutions? I am asking that because I want to find a condition that I can try to reproduce with my setup. Right now, I have 2 4k@60Hz and one 4k widescreen that supports 120Hz.
Changing desktop resolution in KDE system settings doesn't seem to matter, which I guess is to be expected since the driver fails before even getting to the DM stage. They all fail in the same way as far as I can tell:
2x1440p_60.log2x1080p_60.log2x1080p_144.log2x1440p_144.log
Disconnecting one of the screens still causes the system to boot successfully
Setting the resolution in kernel parameters also fixes it, I've tried these three sets and surprisingly they all worked just fine (maybe it doesn't matter what is set as long as anything is set here):
video=DP-1:1920x1080@60 video=DP-2:1920x1080@60
video=DP-1:1920x1080@144 video=DP-2:1920x1080@144
video=DP-1:2560x1440@144 video=DP-2:2560x1440@144
The only difference (compared to the mainline kernel 3 weeks ago) that I've noticed is that with amd-staging-drm-next the screen doesn't die after ~10 seconds and instead the SMU error message keeps repeating seemingly forever
Changing desktop resolution in KDE system settings doesn't seem to matter, which I guess is to be expected since the driver fails before even getting to the DM stage. They all fail in the same way as far as I can tell: 2x1440p_60.log2x1080p_60.log2x1080p_144.log [2x1440p_144.log]
Correct me if I'm wrong, but you were able to reproduce the hang issue with all of the above configurations, right?
I don't know why I'm not able to reproduce this issue...
Could you share the output of the below command?
cat /sys/kernel/debug/dri/0/amdgpu_firmware_info
Also, could you share the edid from your display? You should be able to get it by adapting the below command:
Yes, after applying the patch, with both displays connected and no kernel params set the issue is 100% reproducible, these logs are output from journalctl -k -b -1 after rebooting to a working configuration.
Update my firmware to the latest version available at git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git
Installed ArchLinux
Tried with 3 different displays (4k@60Hz, 2k@60Hz, etc.)
I emulated your EDID
Nevertheless, I can't reproduce this issue no matter what I try. I'm running out of ideas...
Is it possible for you to try with a different display? I mean, with the .pipe_split_policy = MPC_SPLIT_DYNAMIC,. Maybe we can collect more logs when the issue happens. Could you set the log level to 0x4? You can use:
echo 0x4 > /sys/module/drm/parameters/debug
Or you can set this value in the grub menu by adding this parameter:
drm.debug=0x4
Also, when you see the hang, is it something that you cannot ssh to the machine? Finally, are you using X or Wayland?
I haven't patched anything yet, just stock kernel for now: Linux exia 5.16.14-zen1-1-zen #1 (closed) ZEN SMP PREEMPT Fri, 11 Mar 2022 17:40:33 +0000 x86_64 GNU/Linux
Still occurs (using X). Resolution:
HDMI-A-1 connected primary 3440x1440+0+0 (normal left inverted right x axis y axis) 797mm x 333mm 3440x1440 49.99 + 99.98* 59.97
No second screen. The bug can still be worked around by unplugging the screen and replugging it after ~ 30 seconds.
Current kernel parameters:
linux /vmlinuz-linux-zen root=UUID=889bfbb5-f502-4691-bb1c-88fc53c2745a rw amd_iommu=on iommu=pt loglevel=3 quiet amdgpu.runpm=0
Added the drm.debug parameter, will add more info once I have it.