Forget it, now I noticed this is already fixed
I see it's stuck under review...
Audio playback of all software plays looped buffer if one of PipeWire applications (native, not PulseAudio) is paused (e.g. SIGSTOP/SIGTSTP signal or paused on debugger). PipeWire 1.0.1.
Steps to reproduce:
pw-play
)pw-play
application: killall -SIGSTOP pw-play
Example: PW.opus
Błażej Szczygieł (fc8a83c9) at 10 Feb 08:51
Błażej Szczygieł (386a2de8) at 09 Feb 21:41
gallivm/ssbo: mask offset with exec_mask instead of building the 'if'
@zmike Thanks, updated!
Błażej Szczygieł (6ae9ba1b) at 09 Feb 21:37
gallivm/ssbo: mask offset with exec_mask instead of building the 'if'
... and 1 more commit
Works also with known Blender AMD HIP crash
The only problem is that after GPU reset in MODE2, Virtual Console (tty2, etc.) doesn't display anything
Thanks, it's working (tried once so far). It can recover. I missed the reset_method
argument when I checked them last time
For curiosity I used Vulkan for decoding AV_HWDEVICE_TYPE_VULKAN
with RADV_PERFTEST=video_decode
, result:
[drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma2 timeout, signaled seq=164145, emitted seq=164147
[drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process pid 0 thread pid 0
amdgpu 0000:03:00.0: amdgpu: GPU reset begin!
According to https://intel.github.io/libva/ - VA-API is thread-safe.
All VAAPI functions implemented in libva are thread-safe.
It means if we use it on different objects, it must not do things wrong and crash.
If the backend implementation of a VAAPI function is not thread-safe then this should be considered as a bug against the backend implementation.
Looks like a Mesa bug.
Both functions appear on the report every time it crashes.
Please apply the patch from my previous message from collapsed text - it'll stop crashing (workaround).
I found amdgpu_vcn_idle_work_handler()
in amdgpu_vcn.c
. When I remove power gate amdgpu_device_ip_set_powergating_state()
from this function:
I suspect something is wrong with AMD_PG_STATE_GATE
and/or with AMD_PG_SUPPORT_VCN_DPG
.
For curiosity I also forced VCN_DPG_STATE__PAUSE
or VCN_DPG_STATE__UNPAUSE
- I can't see any difference in power consumption, so does DPG really works correctly?
I also have assertion on new vaapi-ffmpeg-test-loop
with 2 threads. The easiest way to reproduce is to remove this_thread::sleep_for()
and set 2 threads. It's unrelated to VCN power gating and GPU reset.
and this is exposing some internal race condition, maybe on the
vaapi-ffmpeg-test-loop
itself
The software creates all contexts per thread, everything is independent.
Does av_hwdevice_ctx_create
and avcodec_close
need a global mutex on different contexts?
vaInitialize()
and vaTerminate()
need a global mutex on different contexts?
diff --git a/main.cpp b/main.cpp
index b2cd77f..028305a 100644
--- a/main.cpp
+++ b/main.cpp
@@ -36,6 +36,7 @@ private:
static counting_semaphore g_sem(0);
static condition_variable g_cond;
static mutex g_mut;
+static mutex g_mut2;
static atomic_bool g_finished(false);
static void decode(const char *path)
@@ -111,11 +112,14 @@ static void decode(const char *path)
while (!g_finished)
{
AVBufferRef *hwdevBuffRef = nullptr;
+ g_mut2.lock();
if (av_hwdevice_ctx_create(&hwdevBuffRef, AV_HWDEVICE_TYPE_VAAPI, nullptr, nullptr, 0) != 0)
{
+ g_mut2.unlock();
cerr << "Can't create hwdev context" << endl;
return;
}
+ g_mut2.unlock();
vCodecCtx->hw_device_ctx = hwdevBuffRef;
@@ -126,7 +130,9 @@ static void decode(const char *path)
}
FreeOnDtor closeCodec([&] {
+ g_mut2.lock();
avcodec_close(vCodecCtx);
+ g_mut2.unlock();
});
{
Try to use the new software: https://gitlab.freedesktop.org/drm/amd/uploads/b905758f94470aa85b54cf3ff095885d/vaapi-ffmpeg-test-loop.tar.xz
The previous one wasn't reproducing GPU crashes correctly.
Right I had it month ago, too (I forgot).
I think the libdrm assert is unrelated, because after more attempts I have GPU reset again:
[drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma2 timeout, signaled seq=1322132, emitted seq=1322134
[drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process pid 0 thread pid 0
amdgpu 0000:03:00.0: amdgpu: GPU reset begin!
Now on Linux 6.7.3 (with the new firmware):
vaapi-ffmpeg-test-loop: ../libdrm-2.4.120/amdgpu/amdgpu_internal.h:164: update_references: Assertion `atomic_read(src) > 0' failed.
Finally crashed, new firmware didn't fix it:
[drm] Found VCN firmware Version ENC: 1.30 DEC: 3 VEP: 0 Revision: 1
amdgpu 0000:03:00.0: amdgpu: Will use PSP to load VCN firmware
[drm] failed to load ucode VCN0_RAM(0x3A)
[drm] psp gfx command LOAD_IP_FW(0x6) failed and response status is (0x0)
[drm] failed to load ucode VCN1_RAM(0x3B)
[drm] psp gfx command LOAD_IP_FW(0x6) failed and response status is (0x0)
[drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring vcn_dec_0 timeout, signaled seq=116694, emitted seq=116696
[drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process vaapi-ffmpeg-te pid 591252 thread vaapi-ffmp:cs0 pid 604124
amdgpu 0000:03:00.0: amdgpu: GPU reset begin!
[drm] Register(0) [mmUVD_POWER_STATUS] failed to reach value 0x00000001 != 0x00000002n
[drm] Register(0) [mmUVD_RBC_RB_RPTR] failed to reach value 0x00000070 != 0x00000000n
[drm] Register(0) [mmUVD_POWER_STATUS] failed to reach value 0x00000001 != 0x00000002n
That loading isn't done on every video open
Yes, that's why I added 1.1 sec delay.
Firmware loading is done because the IP block gets initialized before use, and that's done because it gets uninitialized
Right, so I guess there's not possible to persist firmware when powered down.
reducing power consumption
I found a weird behavior earlier and still can reproduce it. nvtop
shows idle power consumption at ~6W.
When I start video playback (bbb_sunflower_native_60fps_normal.mp4
) using VA-API - it shows ~20W (host-copy, Vulkan, OpenGL - doesn't matter).
When I stop and play the video again (~0.5 sec delay between stop and play, not longer to not unload the firmware) the power consumption is lower, ~16W.
That's strange - wasting of power, it should has 16W at first playback. I can reproduce this behavior in QMPlay2 and VLC.
Installed the same firmware - testing... Didn't crash so far.