Błażej Szczygieł activity

Błażej Szczygieł commented on issue #2372 at drm / amd

2024-09-26T17:38:45Z

I have Fence fallback timer expired on ring comp_1.1.0 or Fence fallback timer expired on ring comp_1.1.1 when playing Horizon Forbidden West on "3D Fullscreen" power profile with default clocks (2489 MHz). No issue on "Compute" or "Virtual reality" profiles (but this profile drains more power). Also no issue on "3D Fullscreen" power profile when I lower the GPU clock manually (2000 MHz).

It causes game stuttering, but not hangs entire PC, I have RX 6900 XT, Mesa 24.1.7, Kernel 6.10.11, Xorg.

Błażej Szczygieł commented on issue #3509 at PipeWire / pipewire

2024-09-08T20:33:59Z

Checked on 1.2.3 - when I remove RT_PROCESS flag from paused application, it's working. Otherwise it's not.

Does it mean it'll works with RT_PROCESS in the future?

Błażej Szczygieł opened issue #11822: [REG 24.1.7->24.2.1, bisected] RADV: uVkCompute's mad_throughput benchmark hangs at Mesa / mesa

2024-09-01T22:05:32Z

Description

uVkCompute's mad_throughput benchmark hangs after FP32 benchmark and before FP16 benchmark on Mesa 24.2.0 and 24.2.1 and main 9a213b88.

Screenshots/video files

Bad (9a213b88):

--------------------------------------------------------------------------------------------------------------------------------------------
Benchmark                                                                                  Time             CPU   Iterations UserCounters...
--------------------------------------------------------------------------------------------------------------------------------------------
AMD Radeon RX 6900 XT (RADV NAVI21)/mad_throughput_f32/1048576/100000/manual_time     168263 us         80.7 us            4 FLOps=12.4636T/s
AMD Radeon RX 6900 XT (RADV NAVI21)/mad_throughput_f32/1048576/200000/manual_time     336029 us         76.9 us            2 FLOps=12.482T/s
^C

Good (24.1.7):

--------------------------------------------------------------------------------------------------------------------------------------------
Benchmark                                                                                  Time             CPU   Iterations UserCounters...
--------------------------------------------------------------------------------------------------------------------------------------------
AMD Radeon RX 6900 XT (RADV NAVI21)/mad_throughput_f32/1048576/100000/manual_time     168447 us         75.9 us            4 FLOps=12.45T/s
AMD Radeon RX 6900 XT (RADV NAVI21)/mad_throughput_f32/1048576/200000/manual_time     336248 us         85.2 us            2 FLOps=12.4738T/s
AMD Radeon RX 6900 XT (RADV NAVI21)/mad_throughput_f16/1048576/100000/manual_time      85366 us         74.8 us            8 FLOps=24.5666T/s
AMD Radeon RX 6900 XT (RADV NAVI21)/mad_throughput_f16/1048576/200000/manual_time     169916 us         81.5 us            4 FLOps=24.6846T/s

Log files (for system lockups / game freezes / crashes)

It's not GPU hang, so no logs in dmesg.

Steps to reproduce

Compile uVkCompute
Run mad_throughput
Observe console output

System information

OS: Manjaro Linux
GPU: Radeon RX 6900 XT
Kernel version: 6.10.6-10-MANJARO
Mesa version: Mesa 24.2.1 (git-68dd5f48)
Desktop environment: XFCE

If applicable

Xserver version: 1.21.1.13
DXVK version: N/A
Wine/Proton version: N/A

Regression

It's working on Mesa 24.1.7.

Further information (optional)

This software is working correctly with AMDGPU-PRO and opensource AMDVLK. RADV_DEBUG=llvm doesn't help.

Does your environment set any of the variables ACO_DEBUG, RADV_DEBUG, and RADV_PERFTEST?

No.

Bisected, commit 6b4b0447 is causing this issue.

Błażej Szczygieł commented on issue #379 at drm / nouveau

2024-07-24T22:36:50Z

I've applied these 2 patches on top of 6.10.1 kernel - seems to be working fine!

Błażej Szczygieł commented on issue #379 at drm / nouveau

2024-07-24T20:23:54Z

Kernel 6.10.

NVK has nothing to do with GPU suspend.

I'm aware kernel suspends GPU, but I thought NVK throws Device Lost due to timeout - GPU wake-up takes few seconds, so maybe wait for resume a bit longer?

Błażej Szczygieł opened issue #11567: NVK: Device lost when nvidia dGPU is suspended at Mesa / mesa

2024-07-24T19:47:46Z

System information

Kernel 6.10 (Manjaro), Xorg with modesetting driver. Tested two configurations:

Intel UHD + RTX 2060 Max-Q
Radeon iGPU + RTX 3060

Describe the issue

On laptop with Radeon+Nvidia or Intel+Nvidia, the Nvidia dGPU suspends on inactivity. When there's already a Vulkan logical device opened and dGPU suspends (on inactivity), further using the already opened Vulkan logical device wakes-up the dGPU, but in meantime NVK throws Device Lost error. It can be reproduced on video player when video playback is paused and resumed after a while. It can be reproduced on MPV and QMPlay2.

Maybe the timeout is too short?

Błażej Szczygieł commented on issue #11383 at Mesa / mesa

2024-07-24T19:02:21Z

The same happens with Nvidia drivers, I hope it'll get better with NVK in the future! 🙂

Błażej Szczygieł commented on issue #3435 at drm / amd

2024-06-21T08:45:33Z

Thanks! There's already a patch for this issue which works!

Błażej Szczygieł commented on issue #3435 at drm / amd

2024-06-20T22:23:52Z

@superm1 Reported here!

I started bisecting 6.9..6.10-rc1 but I don't have time to finish it right now. Kernel compilation is a bit slow (entire kernel is recompiling on every step - or maybe it's a better way?).

Błażej Szczygieł commented on issue #3435 at drm / amd

2024-06-20T21:00:41Z

@superm1 Thanks, I've tried 6.10-rc4 again with amd_iommu=off - it works (PC and laptop)! So there is still something wrong with IOMMU on 6.10-rc4.

On PC with 6900 XT I have iommu=pt, on laptop with Vega 8 I have iommu=soft. No issues with suspend with older kernels on this setting.

Błażej Szczygieł commented on issue #3435 at drm / amd

2024-06-20T20:14:19Z

@agd5f

It's a long task, especially because 6.10-rc1 kernel is crashing in different way that 6.10-rc4 kernel. Rc-1 doesn't show anything in logs and after resume nothing works including SSH.

Something is really wrong with 6.10 kernel... I can back to bisecting within few days if really needed.

I've also checked 6.10-rc3 (because I have still 6.10-rc3 in repository there) on my laptop with Ryzen 2500U (Vega 8 graphics) - it also cannot wake up from suspend, so it's not only 6000 dGPU series, logs:

ACPI: PM: Waking up from system sleep state S3
ACPI: EC: interrupt unblocked
power_supply BAT1: PM: parent PNP0C0A:00 should not be sleeping
ACPI: EC: event unblocked
[drm] PCIE GART of 1024M enabled.
[drm] PTB located at 0x000000F400A00000
amdgpu 0000:02:00.0: amdgpu: PSP is resuming...
amdgpu 0000:02:00.0: amdgpu: reserve 0x400000 from 0xf43fc00000 for PSP TMR
iommu ivhd0: AMD-Vi: Event logged [ILLEGAL_DEV_TABLE_ENTRY device=0000:02:00.0 pasid=0x00000 address=0x108000000 flags=0x0080]
AMD-Vi: DTE[0]: 6990000000000003
AMD-Vi: DTE[1]: 0000100104240004
AMD-Vi: DTE[2]: 0000000000000000
AMD-Vi: DTE[3]: 0000000000000000
amdgpu 0000:02:00.0: amdgpu: failed to load ucode SDMA0(0x1) 
amdgpu 0000:02:00.0: amdgpu: psp gfx command LOAD_IP_FW(0x6) failed and response status is (0xA)
iommu ivhd0: AMD-Vi: Event logged [ILLEGAL_DEV_TABLE_ENTRY device=0000:02:00.0 pasid=0x00000 address=0x108005000 flags=0x0080]
AMD-Vi: DTE[0]: 6990000000000003
AMD-Vi: DTE[1]: 0000100104240004
AMD-Vi: DTE[2]: 0000000000000000
AMD-Vi: DTE[3]: 0000000000000000
amdgpu 0000:02:00.0: amdgpu: failed to load ucode CP_CE(0xB) 
amdgpu 0000:02:00.0: amdgpu: psp gfx command LOAD_IP_FW(0x6) failed and response status is (0xA)
iommu ivhd0: AMD-Vi: Event logged [ILLEGAL_DEV_TABLE_ENTRY device=0000:02:00.0 pasid=0x00000 address=0x108008000 flags=0x0080]
AMD-Vi: DTE[0]: 6990000000000003
AMD-Vi: DTE[1]: 0000100104240004
AMD-Vi: DTE[2]: 0000000000000000
AMD-Vi: DTE[3]: 0000000000000000
amdgpu 0000:02:00.0: amdgpu: failed to load ucode CP_PFP(0xC) 
amdgpu 0000:02:00.0: amdgpu: psp gfx command LOAD_IP_FW(0x6) failed and response status is (0xA)
iommu ivhd0: AMD-Vi: Event logged [ILLEGAL_DEV_TABLE_ENTRY device=0000:02:00.0 pasid=0x00000 address=0x10800e000 flags=0x0080]
AMD-Vi: DTE[0]: 6990000000000003
AMD-Vi: DTE[1]: 0000100104240004
AMD-Vi: DTE[2]: 0000000000000000
AMD-Vi: DTE[3]: 0000000000000000
amdgpu 0000:02:00.0: amdgpu: failed to load ucode CP_ME(0xD) 
amdgpu 0000:02:00.0: amdgpu: psp gfx command LOAD_IP_FW(0x6) failed and response status is (0xA)
iommu ivhd0: AMD-Vi: Event logged [ILLEGAL_DEV_TABLE_ENTRY device=0000:02:00.0 pasid=0x00000 address=0x108013000 flags=0x0080]
AMD-Vi: DTE[0]: 6990000000000003
AMD-Vi: DTE[1]: 0000100104240004
AMD-Vi: DTE[2]: 0000000000000000
AMD-Vi: DTE[3]: 0000000000000000
amdgpu 0000:02:00.0: amdgpu: failed to load ucode CP_MEC1(0x19) 
amdgpu 0000:02:00.0: amdgpu: psp gfx command LOAD_IP_FW(0x6) failed and response status is (0xA)
iommu ivhd0: AMD-Vi: Event logged [ILLEGAL_DEV_TABLE_ENTRY device=0000:02:00.0 pasid=0x00000 address=0x108055000 flags=0x0080]
AMD-Vi: DTE[0]: 6990000000000003
AMD-Vi: DTE[1]: 0000100104240004
AMD-Vi: DTE[2]: 0000000000000000
AMD-Vi: DTE[3]: 0000000000000000
amdgpu 0000:02:00.0: amdgpu: failed to load ucode CP_MEC1_JT(0x1A) 
amdgpu 0000:02:00.0: amdgpu: psp gfx command LOAD_IP_FW(0x6) failed and response status is (0xA)
iommu ivhd0: AMD-Vi: Event logged [ILLEGAL_DEV_TABLE_ENTRY device=0000:02:00.0 pasid=0x00000 address=0x108056000 flags=0x0080]
AMD-Vi: DTE[0]: 6990000000000003
AMD-Vi: DTE[1]: 0000100104240004
AMD-Vi: DTE[2]: 0000000000000000
AMD-Vi: DTE[3]: 0000000000000000
amdgpu 0000:02:00.0: amdgpu: failed to load ucode CP_MEC2(0x1B) 
amdgpu 0000:02:00.0: amdgpu: psp gfx command LOAD_IP_FW(0x6) failed and response status is (0xA)
iommu ivhd0: AMD-Vi: Event logged [ILLEGAL_DEV_TABLE_ENTRY device=0000:02:00.0 pasid=0x00000 address=0x10dd18000 flags=0x0080]
AMD-Vi: DTE[0]: 6990000000000003
AMD-Vi: DTE[1]: 0000100104240004
AMD-Vi: DTE[2]: 0000000000000000
AMD-Vi: DTE[3]: 0000000000000000
amdgpu 0000:02:00.0: amdgpu: failed to load ucode CP_MEC2_JT(0x1C) 
amdgpu 0000:02:00.0: amdgpu: psp gfx command LOAD_IP_FW(0x6) failed and response status is (0xA)
iommu ivhd0: AMD-Vi: Event logged [ILLEGAL_DEV_TABLE_ENTRY device=0000:02:00.0 pasid=0x00000 address=0x10dd19000 flags=0x0080]
AMD-Vi: DTE[0]: 6990000000000003
AMD-Vi: DTE[1]: 0000100104240004
AMD-Vi: DTE[2]: 0000000000000000
AMD-Vi: DTE[3]: 0000000000000000
amdgpu 0000:02:00.0: amdgpu: failed to load ucode RLC_RESTORE_LIST_CNTL(0x28) 
amdgpu 0000:02:00.0: amdgpu: psp gfx command LOAD_IP_FW(0x6) failed and response status is (0xA)
iommu ivhd0: AMD-Vi: Event logged [ILLEGAL_DEV_TABLE_ENTRY device=0000:02:00.0 pasid=0x00000 address=0x10dd1a000 flags=0x0080]
AMD-Vi: DTE[0]: 6990000000000003
AMD-Vi: DTE[1]: 0000100104240004
AMD-Vi: DTE[2]: 0000000000000000
AMD-Vi: DTE[3]: 0000000000000000
amdgpu 0000:02:00.0: amdgpu: failed to load ucode RLC_RESTORE_LIST_GPM_MEM(0x29) 
amdgpu 0000:02:00.0: amdgpu: psp gfx command LOAD_IP_FW(0x6) failed and response status is (0xA)
iommu ivhd0: AMD-Vi: Event logged [ILLEGAL_DEV_TABLE_ENTRY device=0000:02:00.0 pasid=0x00000 address=0x10dd1b000 flags=0x0080]
AMD-Vi: DTE[0]: 6990000000000003
AMD-Vi: DTE[1]: 0000100104240004
AMD-Vi: DTE[2]: 0000000000000000
AMD-Vi: DTE[3]: 0000000000000000
amdgpu 0000:02:00.0: amdgpu: failed to load ucode RLC_RESTORE_LIST_SRM_MEM(0x2A) 
amdgpu 0000:02:00.0: amdgpu: psp gfx command LOAD_IP_FW(0x6) failed and response status is (0xA)
iommu ivhd0: AMD-Vi: Event logged [ILLEGAL_DEV_TABLE_ENTRY device=0000:02:00.0 pasid=0x00000 address=0x10dd1e000 flags=0x0080]
AMD-Vi: DTE[0]: 6990000000000003
AMD-Vi: DTE[1]: 0000100104240004
AMD-Vi: DTE[2]: 0000000000000000
AMD-Vi: DTE[3]: 0000000000000000
amdgpu 0000:02:00.0: amdgpu: failed to load ucode RLC_G(0x2F) 
amdgpu 0000:02:00.0: amdgpu: psp gfx command LOAD_IP_FW(0x6) failed and response status is (0xA)
iommu ivhd0: AMD-Vi: Event logged [ILLEGAL_DEV_TABLE_ENTRY device=0000:02:00.0 pasid=0x00000 address=0x10dd23000 flags=0x0080]
AMD-Vi: DTE[0]: 6990000000000003
AMD-Vi: DTE[1]: 0000100104240004
AMD-Vi: DTE[2]: 0000000000000000
AMD-Vi: DTE[3]: 0000000000000000
amdgpu 0000:02:00.0: amdgpu: failed to load ucode VCN(0x36) 
amdgpu 0000:02:00.0: amdgpu: psp gfx command LOAD_IP_FW(0x6) failed and response status is (0xA)
iommu ivhd0: AMD-Vi: Event logged [ILLEGAL_DEV_TABLE_ENTRY device=0000:02:00.0 pasid=0x00000 address=0x10dc00000 flags=0x0080]
AMD-Vi: DTE[0]: 6990000000000003
AMD-Vi: DTE[1]: 0000100104240004
AMD-Vi: DTE[2]: 0000000000000000
AMD-Vi: DTE[3]: 0000000000000000
amdgpu 0000:02:00.0: amdgpu: psp gfx command LOAD_ASD(0x4) failed and response status is (0xA)
amdgpu 0000:02:00.0: amdgpu: RAS: optional ras ta ucode is not available
iommu ivhd0: AMD-Vi: Event logged [ILLEGAL_DEV_TABLE_ENTRY device=0000:02:00.0 pasid=0x00000 address=0x10dc00000 flags=0x0080]
AMD-Vi: DTE[0]: 6990000000000003
AMD-Vi: DTE[1]: 0000100104240004
AMD-Vi: DTE[2]: 0000000000000000
AMD-Vi: DTE[3]: 0000000000000000
amdgpu 0000:02:00.0: amdgpu: psp gfx command LOAD_TA(0x1) failed and response status is (0xA)
iommu ivhd0: AMD-Vi: Event logged [ILLEGAL_DEV_TABLE_ENTRY device=0000:02:00.0 pasid=0x00000 address=0x10dc00000 flags=0x0080]
AMD-Vi: DTE[0]: 6990000000000003
AMD-Vi: DTE[1]: 0000100104240004
AMD-Vi: DTE[2]: 0000000000000000
AMD-Vi: DTE[3]: 0000000000000000
amdgpu 0000:02:00.0: amdgpu: psp gfx command LOAD_TA(0x1) failed and response status is (0xA)
amdgpu 0000:02:00.0: amdgpu: RAP: optional rap ta ucode is not available
amdgpu 0000:02:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available
amdgpu: restore the fine grain parameters
usb 1-2: reset full-speed USB device number 2 using xhci_hcd
usb 3-1: reset high-speed USB device number 2 using xhci_hcd
ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
ata2.00: supports DRM functions and may not be fully accessible
sd 1:0:0:0: [sda] Starting disk
ata2.00: supports DRM functions and may not be fully accessible
ata2.00: configured for UDMA/133
amdgpu 0000:02:00.0: amdgpu: psp gfx command INVOKE_CMD(0x3) failed and response status is (0x117)
[drm] Failed to enable ASSR
amdgpu 0000:02:00.0: amdgpu: psp gfx command INVOKE_CMD(0x3) failed and response status is (0x117)
iommu ivhd0: AMD-Vi: Event logged [ILLEGAL_DEV_TABLE_ENTRY device=0000:02:00.0 pasid=0x00000 address=0x10e800000 flags=0x00a0]
AMD-Vi: DTE[0]: 6990000000000003
AMD-Vi: DTE[1]: 0000100104240004
AMD-Vi: DTE[2]: 0000000000000000
AMD-Vi: DTE[3]: 0000000000000000
iommu ivhd0: AMD-Vi: Event logged [ILLEGAL_DEV_TABLE_ENTRY device=0000:02:00.0 pasid=0x00000 address=0x10e840000 flags=0x00a0]
AMD-Vi: DTE[0]: 6990000000000003
AMD-Vi: DTE[1]: 0000100104240004
AMD-Vi: DTE[2]: 0000000000000000
AMD-Vi: DTE[3]: 0000000000000000
iommu ivhd0: AMD-Vi: Event logged [ILLEGAL_DEV_TABLE_ENTRY device=0000:02:00.0 pasid=0x00000 address=0xfffffffdf82f0100 flags=0x0088]
AMD-Vi: DTE[0]: 6990000000000003
AMD-Vi: DTE[1]: 0000100104240004
AMD-Vi: DTE[2]: 0000000000000000
AMD-Vi: DTE[3]: 0000000000000000
iommu ivhd0: AMD-Vi: Event logged [ILLEGAL_DEV_TABLE_ENTRY device=0000:02:00.0 pasid=0x00000 address=0x10e800020 flags=0x00a0]
AMD-Vi: DTE[0]: 6990000000000003
AMD-Vi: DTE[1]: 0000100104240004
AMD-Vi: DTE[2]: 0000000000000000
AMD-Vi: DTE[3]: 0000000000000000
iommu ivhd0: AMD-Vi: Event logged [ILLEGAL_DEV_TABLE_ENTRY device=0000:02:00.0 pasid=0x00000 address=0x10e800040 flags=0x00a0]
AMD-Vi: DTE[0]: 6990000000000003
AMD-Vi: DTE[1]: 0000100104240004
AMD-Vi: DTE[2]: 0000000000000000
AMD-Vi: DTE[3]: 0000000000000000
iommu ivhd0: AMD-Vi: Event logged [ILLEGAL_DEV_TABLE_ENTRY device=0000:02:00.0 pasid=0x00000 address=0x10e840000 flags=0x00a0]
AMD-Vi: DTE[0]: 6990000000000003
AMD-Vi: DTE[1]: 0000100104240004
AMD-Vi: DTE[2]: 0000000000000000
AMD-Vi: DTE[3]: 0000000000000000
amdgpu 0000:02:00.0: [drm] *ERROR* [CRTC:73:crtc-0] flip_done timed out
iommu ivhd0: AMD-Vi: Event logged [ILLEGAL_DEV_TABLE_ENTRY device=0000:02:00.0 pasid=0x00000 address=0x10e800060 flags=0x00a0]
AMD-Vi: DTE[0]: 6990000000000003
AMD-Vi: DTE[1]: 0000100104240004
AMD-Vi: DTE[2]: 0000000000000000
AMD-Vi: DTE[3]: 0000000000000000
iommu ivhd0: AMD-Vi: Event logged [ILLEGAL_DEV_TABLE_ENTRY device=0000:02:00.0 pasid=0x00000 address=0x1042f4004 flags=0x0180]
AMD-Vi: DTE[0]: 6990000000000003
AMD-Vi: DTE[1]: 0000100104240004
AMD-Vi: DTE[2]: 0000000000000000
AMD-Vi: DTE[3]: 0000000000000000
iommu ivhd0: AMD-Vi: Event logged [ILLEGAL_DEV_TABLE_ENTRY device=0000:02:00.0 pasid=0x00000 address=0x10e840000 flags=0x00a0]
AMD-Vi: DTE[0]: 6990000000000003
AMD-Vi: DTE[1]: 0000100104240004
AMD-Vi: DTE[2]: 0000000000000000
AMD-Vi: DTE[3]: 0000000000000000
[drm] kiq ring mec 2 pipe 1 q 0
iommu ivhd0: AMD-Vi: Event logged [ILLEGAL_DEV_TABLE_ENTRY device=0000:02:00.0 pasid=0x00000 address=0x107f00004 flags=0x0180]
AMD-Vi: DTE[0]: 6990000000000003
AMD-Vi: DTE[1]: 0000100104240004
AMD-Vi: DTE[2]: 0000000000000000
AMD-Vi: DTE[3]: 0000000000000000
amdgpu 0000:02:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring kiq_0.2.1.0 test failed (-110)
[drm:amdgpu_gfx_enable_kcq.cold [amdgpu]] *ERROR* KCQ enable failed
[drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block  failed -110
amdgpu 0000:02:00.0: amdgpu: amdgpu_device_ip_resume failed (-110).
amdgpu 0000:02:00.0: PM: dpm_run_callback(): pci_pm_resume returns -110
amdgpu 0000:02:00.0: PM: failed to resume async: error -110

Błażej Szczygieł commented on issue #2893 at drm / amd

2024-06-17T09:37:38Z

Done, thanks!

Błażej Szczygieł commented on issue #3435 at drm / amd

2024-06-17T09:37:28Z

CC @ckoenig

Błażej Szczygieł opened issue #3435: [6.10-rc4] [AMD-IOMMU] Wake up from suspend: "amdgpu: PSP load kdb failed!" at drm / amd

2024-06-17T09:36:22Z

Brief summary of the problem:

Suspend PC on kernel 6.10-rc1 to 6.10-rc4. Wake up from suspend. Observe black screen. Reset PC (can use SYSRQ keys). See logs from previous boot.

Working correctly in 6.9.3.

Hardware description:

CPU: Ryzen 7950X
GPU: Radeon 6900 XT
System Memory: 32GB
Display: 3840x2160@60
Type of Display Connection: DP

System information:

Distro name and Version: Manjaro Linux
Kernel version: 6.10-rc4

How to reproduce the issue:

suspend
wake up

Logs (6.10-rc4):

amdgpu 0000:03:00.0: amdgpu: PSP load kdb failed!
amdgpu 0000:03:00.0: amdgpu: PSP resume failed
[drm:amdgpu_device_fw_loading [amdgpu]] *ERROR* resume of IP block  failed -62
amdgpu 0000:03:00.0: amdgpu: amdgpu_device_ip_resume failed (-62).
amdgpu 0000:03:00.0: PM: dpm_run_callback(): pci_pm_resume returns -62
amdgpu 0000:03:00.0: PM: failed to resume async: error -62

amdgpu: Move buffer fallback to memcpy unavailable

amdgpu 0000:03:00.0: [drm] REG_WAIT timeout 1us * 1000 tries - dcn20_dpp_pg_control line:560
amdgpu 0000:03:00.0: [drm] REG_WAIT timeout 1us * 1000 tries - dcn20_hubp_pg_control line:634
amdgpu 0000:03:00.0: [drm] REG_WAIT timeout 1us * 1000 tries - dcn20_dpp_pg_control line:568
amdgpu 0000:03:00.0: [drm] REG_WAIT timeout 1us * 1000 tries - dcn20_hubp_pg_control line:642
amdgpu 0000:03:00.0: [drm] REG_WAIT timeout 1us * 1000 tries - dcn20_dpp_pg_control line:576
amdgpu 0000:03:00.0: [drm] REG_WAIT timeout 1us * 1000 tries - dcn20_hubp_pg_control line:650
amdgpu 0000:03:00.0: [drm] REG_WAIT timeout 1us * 1000 tries - dcn20_dpp_pg_control line:584
amdgpu 0000:03:00.0: [drm] REG_WAIT timeout 1us * 1000 tries - dcn20_hubp_pg_control line:658
amdgpu 0000:03:00.0: [drm] REG_WAIT timeout 1us * 1000 tries - dcn20_dpp_pg_control line:592
amdgpu 0000:03:00.0: [drm] REG_WAIT timeout 1us * 1000 tries - dcn20_hubp_pg_control line:666
amdgpu 0000:03:00.0: [drm] REG_WAIT timeout 1us * 1000 tries - dcn20_dsc_pg_control line:489
amdgpu 0000:03:00.0: [drm] REG_WAIT timeout 1us * 1000 tries - dcn20_dsc_pg_control line:497
amdgpu 0000:03:00.0: [drm] REG_WAIT timeout 1us * 1000 tries - dcn20_dsc_pg_control line:505
amdgpu 0000:03:00.0: [drm] REG_WAIT timeout 1us * 1000 tries - dcn20_dsc_pg_control line:513
amdgpu 0000:03:00.0: [drm] REG_WAIT timeout 1us * 1000 tries - dcn20_dsc_pg_control line:521
amdgpu 0000:03:00.0: [drm] REG_WAIT timeout 1us * 1000 tries - dcn20_dsc_pg_control line:529

[drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, signaled seq=1220, emitted seq=1222
amdgpu 0000:03:00.0: amdgpu: GPU reset begin!

Błażej Szczygieł commented on issue #2893 at drm / amd

2024-06-16T23:55:21Z

When we tested these patches, it was working perfectly. Now I'm testing 6.10-rc1 to 6.10-rc4 - resume from suspend on Radeon is entirely broken. I don't know if it's caused by these patches, or it's another issue, here are some logs:

amdgpu 0000:03:00.0: amdgpu: PSP load kdb failed!
amdgpu 0000:03:00.0: amdgpu: PSP resume failed
[drm:amdgpu_device_fw_loading [amdgpu]] *ERROR* resume of IP block  failed -62
amdgpu 0000:03:00.0: amdgpu: amdgpu_device_ip_resume failed (-62).
amdgpu 0000:03:00.0: PM: dpm_run_callback(): pci_pm_resume returns -62
amdgpu 0000:03:00.0: PM: failed to resume async: error -62

amdgpu: Move buffer fallback to memcpy unavailable

amdgpu 0000:03:00.0: [drm] REG_WAIT timeout 1us * 1000 tries - dcn20_dpp_pg_control line:560
amdgpu 0000:03:00.0: [drm] REG_WAIT timeout 1us * 1000 tries - dcn20_hubp_pg_control line:634
amdgpu 0000:03:00.0: [drm] REG_WAIT timeout 1us * 1000 tries - dcn20_dpp_pg_control line:568
amdgpu 0000:03:00.0: [drm] REG_WAIT timeout 1us * 1000 tries - dcn20_hubp_pg_control line:642
amdgpu 0000:03:00.0: [drm] REG_WAIT timeout 1us * 1000 tries - dcn20_dpp_pg_control line:576
amdgpu 0000:03:00.0: [drm] REG_WAIT timeout 1us * 1000 tries - dcn20_hubp_pg_control line:650
amdgpu 0000:03:00.0: [drm] REG_WAIT timeout 1us * 1000 tries - dcn20_dpp_pg_control line:584
amdgpu 0000:03:00.0: [drm] REG_WAIT timeout 1us * 1000 tries - dcn20_hubp_pg_control line:658
amdgpu 0000:03:00.0: [drm] REG_WAIT timeout 1us * 1000 tries - dcn20_dpp_pg_control line:592
amdgpu 0000:03:00.0: [drm] REG_WAIT timeout 1us * 1000 tries - dcn20_hubp_pg_control line:666
amdgpu 0000:03:00.0: [drm] REG_WAIT timeout 1us * 1000 tries - dcn20_dsc_pg_control line:489
amdgpu 0000:03:00.0: [drm] REG_WAIT timeout 1us * 1000 tries - dcn20_dsc_pg_control line:497
amdgpu 0000:03:00.0: [drm] REG_WAIT timeout 1us * 1000 tries - dcn20_dsc_pg_control line:505
amdgpu 0000:03:00.0: [drm] REG_WAIT timeout 1us * 1000 tries - dcn20_dsc_pg_control line:513
amdgpu 0000:03:00.0: [drm] REG_WAIT timeout 1us * 1000 tries - dcn20_dsc_pg_control line:521
amdgpu 0000:03:00.0: [drm] REG_WAIT timeout 1us * 1000 tries - dcn20_dsc_pg_control line:529

[drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, signaled seq=1220, emitted seq=1222
amdgpu 0000:03:00.0: amdgpu: GPU reset begin!

@ckoenig Could you check it?

Błażej Szczygieł commented on issue #3317 at drm / amd

2024-06-04T15:44:45Z

I can confirm this issue on RX 6900 XT (kernel 6.6.x, 6.8.x, 6.9.x). Looks like MODE2 reset (amdgpu.reset_method=3) is not working with hibernation:

Click to expand

Błażej Szczygieł commented on issue #2893 at drm / amd

2024-04-08T19:46:13Z

Ok, so I missed them. Waiting to be merged.

Błażej Szczygieł opened issue #10977: radv: Vulkan AV1 video decode glitches at Mesa / mesa

2024-04-08T19:28:46Z

Description

Some AV1 videos output corrupted frames when decoding using Vulkan API and FFmpeg 7.0. These videos plays well on VA-API.

Screenshots/video files

Short video to test: test.mkv

Example screenshot:

Steps to reproduce

Play attached video with e.g. FFmpeg 7.0 with Vulkan decoder (use attached test.mkv: ffmpeg -init_hw_device "vulkan=vk:0" -hwaccel vulkan -hwaccel_output_format vulkan -i test.mkv -vf hwdownload,format=p010 output.mkv).
Observe flickering corrupted frames (e.g. in output.mkv video).

System information

Radeon RX 6900XT, Kernel 6.8.4 (Manjaro), firmware 20240312.3b128b60, Mesa b0653370.

RADV_PERFTEST=video_decode is set.

Błażej Szczygieł commented on issue #2893 at drm / amd

2024-04-06T20:23:24Z

I see these patches in 6.9-rc2 kernel on GitHub, installed 6.9-rc2 (Manjaro Linux) and it doesn't work at all.

Błażej Szczygieł commented on issue #8244 at Mesa / mesa

2024-04-05T17:41:15Z

From the issue's title "performance improvements around memory access emits" (some of them, SSBOs):

exec mask is used to mask inactive offsets (to prevent out-of-bounds access) instead of checking condition in run-time loop (only when reading data, write couldn't be done like this)
reading a vector without a run-time loop (we know the vector length, so loop can be unrolled in generated code)

Bounds checking still has to be removed and other go further memory access optimizations. However when I manually remove bounds checking, the compute performance is reasonable:

Click to expand