Due to an influx of spam, we have had to impose restrictions on new accounts. Please see this wiki page for instructions on how to get full permissions. Sorry for the inconvenience.
Admin message
Equinix is shutting down its operations with us on April 30, 2025. They have graciously supported us for almost 5 years, but all good things come to an end. We are expecting to transition to new infrastructure between late March and mid-April. We do not yet have a firm timeline for this, but it will involve (probably multiple) periods of downtime as we move our services whilst also changing them to be faster and more responsive. Any updates will be posted in freedesktop/freedesktop#2011 as it becomes clear, and any downtime will be announced with further broadcast messages.
[R9 390X] Broken hardware acceleration in 6.10 kernel
After I update my arch linux installation when the kernel update 6.10 arrived, I got issue that I can't use hardware acceleration, but current LTS-kernel works fine.
Hardware description:
CPU: AMD Ryzen 5 5600X 6-Core Processor
GPU: AMD Radeon R9 390 Series
System Memory: 16GiB
Display(s): VA2231 Series 1920x1080 60.0 HZ
Type of Display Connection: DVI-D
System information:
Distro name and Version: Arch Linux
Kernel version: 6.10.7-arch1-1
Custom kernel: N/A
AMD official driver version: N/A
How to reproduce the issue:
Run mpv --hwdec=auto then you will get error like:
amdgpu: The CS has been rejected, see dmesg for more information (-22).zsh: IOT instruction (core dumped) mpv
And you will see that specific error in dmesg everytime you do thing above:
[drm:amdgpu_uvd_cs_pass2 [amdgpu]] *ERROR* msg/fb buffer ff00f7c000-ff00f7e000 out of 256MB segment!
Log files (for system lockups / game freezes / crashes)
As added information, I have tested to see that attempting to play H.264 videos would cause the aforementioned error, but H.265 and VP8 seems to be okay.
I have bisected the kernel to the commit f3572db3c049b4d32bb5ba77ad5305616c44c7c1 (merged since 6.10.4), reverting it has fixed hardware decoding on my Radeon 520.
I'm running out of ideas what that could be. Could you add this code chunk here and see if you have any "Test..." prints in dmesg right before the problem happens?
Getting a failure here (I should have mentioned, I'm currently on 6.10.7, sorry!):
drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c: In function ‘amdgpu_cs_find_mapping’:drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c:1793:35: error: ‘bo’ is a pointer to pointer; did you mean to dereference it before applying ‘->’ to it? 1793 | printk("Test 0x%08x\n", bo->resource->placement); | ^~./include/linux/printk.h:433:33: note: in definition of macro ‘printk_index_wrap’ 433 | _p_func(_fmt, ##__VA_ARGS__); \ | ^~~~~~~~~~~drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c:1793:9: note: in expansion of macro ‘printk’ 1793 | printk("Test 0x%08x\n", bo->resource->placement); | ^~~~~~
@ckoenig Well at least we are on some sort of firm ground now! Let me know if you need any further help with testing - I'm happy to help.
And sorry if this came across as presumptious - what is AMD's current support policy for radeon-powered cards, such as TeraScale 2 and older generation cards? I'm currently running into a bug at #3604, would love to know if there remains anyone who can look into issues like this.
Same for me with 6.10.9 and every version before. In fact I'm having trouble, and have reported several radeon related bugs since Fedora 38, my first installation. I'm now on Fedora 40 and still with the original 6.8.7 kernel as it's the only one behaving properly as far as radeon/amdgpu is concerned. Nothing between 6.8.8 and 6.10.9 worked for me.
Symptom: Kodi crashing instantly as soon as I click on a video.
amdgpu: The CS has been rejected, see dmesg for more information (-22).[drm:amdgpu_uvd_cs_pass2 [amdgpu]] *ERROR* msg/fb buffer ff00d0e000-ff00d10000 out of 256MB segment!
I made a diff of the actual git diff v6.8..v6.11 -- drivers/gpu/drm/amd/amdgpu/ content to compare and there are several significant changes so I'm wondering whether real people are actually testing those changes or it's purely theoretical.
@arunpravin24 Sorry to nag but do we have any updates on fixing hardware acceleration for GCN 1.0/2.0 cards? It has been difficult to follow the latest kernel updates on my hardware.
Just FYI it seems that on the 6.9.x kernel series this isn't an issue.
But 6.10 is all busted. I confirmed 6.9.12 on fedora 40 kde this from my own testing, but it was also discussed here https://bbs.archlinux.org/viewtopic.php?pid=2191561#p2191561
I attempted to replicate the issue on new cards by limiting the memory, but I was unsuccessful.
Finally today I will receive the R9 390 card. I will check and update.
I'm getting the same "out of 256mb segment" dmesg error as above when using any application that tries to use h264 acceleration. MPV with hwdec and others just simply crash while firefox will fall back to software decoding. Honestly it's not THAT big a deal but having better web performance and battery life by saving CPU processing power is pretty helpful on my devices.
Here's my devices hardware info for testing, I'm also eager for this bug to be fixed.
Device 1 - HP-T620:
OS: Debian Testing/Trixie on Kernel 6.11.4, CPU: AMD GX-415GA, GPU: HD 8330E IGPU
Device 2 - Optiplex 780:
OS: Debian Testing/Trixie on Kernel 6.11.4, CPU: Core 2 Duo e8500, GPU: R5-240 (GDDR3)
Device 3 - Old Custom:
OS: Debian Testing/Trixie on Kernel 6.11.4, CPU: FX-6350, GPU: HD-7770
All of them are running AMDGPU for vulkan support and they're all affected it seems.
When running mpv --hwdec=auto i get the following dmesg errors:
[nov 1 18:29] UVD Test 0x00000001[ +0,002102] UVD Test 0x00000001[ +0,000028] UVD Test 0x00000001[ +0,000005] [drm:amdgpu_uvd_cs_pass2 [amdgpu]] *ERROR* msg/fb buffer ff00bd0000-ff00bd2000 out of 256MB segment!
Please let know us when it would be merged with latest kernel so we can update and tested without rebuilding the kernel on our side.
Very much deserved thanks for your fix.
Thank you for the patch. There has been a definite progress, but unfortunately the problems have not been ultimately solved for my card.
I applied the patch to the kernel version 6.11.5 in Debian, and my outcome has been that for a HD 7750 card, I now am able to launch a video playback, but a following error emerges. Before that, I was not able to launch a video playback at all on 6.11.5. The video playback worked almost perfectly (up to approx. once a month crash) on earlier 5.x series kernels.
Playing: 02.redacted.s01.E02.(2023).HDTV.(1080р).by.redacted.ts (+) Video --vid=1 (h264 1920x1080 25.000fps) (+) Audio --aid=1 --alang=Und (mp2 2ch 48000Hz)Using hardware decoding (vaapi).VO: [gpu] 1920x1080 vaapi[nv12][ffmpeg/video] h264: Failed to allocate a vaapi/nv12 frame from a fixed pool of hardware frames.[ffmpeg/video] h264: Consider setting extra_hw_frames to a larger value (currently set to -1, giving a pool size of 22).[ffmpeg/video] h264: get_buffer() failed[ffmpeg/video] h264: decode_slice_header error[ffmpeg/video] h264: no frame!Error while decoding frame (hardware decoding)![ffmpeg/video] h264: get_buffer() failed[ffmpeg/video] h264: decode_slice_header error[ffmpeg/video] h264: no frame!Error while decoding frame (hardware decoding)![ffmpeg/video] h264: get_buffer() failed[ffmpeg/video] h264: decode_slice_header error[ffmpeg/video] h264: no frame!Error while decoding frame (hardware decoding)!Attempting next decoding method after failure of h264-vaapi.[ffmpeg/video] h264: co located POCs unavailable[ffmpeg/video] h264: co located POCs unavailableAO: [pipewire] 48000Hz stereo 2ch doublepAV: 00:00:00 / 00:48:22 (0%) A-V: -0.007
There's nothing appearing in the dmesg log when this error is triggered in mpv.
My vainfo output with the patch is:
Trying display: waylandTrying display: x11libva info: VA-API version 1.22.0libva info: Trying to open /usr/lib/x86_64-linux-gnu/dri/radeonsi_drv_video.solibva info: Found init function __vaDriverInit_1_22libva info: va_openDriver() returns 0vainfo: VA-API version: 1.22 (libva 2.22.0)vainfo: Driver version: Mesa Gallium driver 24.2.4-1 for AMD Radeon HD 7700 Series (radeonsi, verde, LLVM 19.1.1, DRM 3.59, 6.11.5-fix-amdgpu-01)vainfo: Supported profile and entrypoints VAProfileMPEG2Simple : VAEntrypointVLD VAProfileMPEG2Main : VAEntrypointVLD VAProfileVC1Simple : VAEntrypointVLD VAProfileVC1Main : VAEntrypointVLD VAProfileVC1Advanced : VAEntrypointVLD VAProfileH264ConstrainedBaseline: VAEntrypointVLD VAProfileH264Main : VAEntrypointVLD VAProfileH264High : VAEntrypointVLD VAProfileNone : VAEntrypointVideoProc
Also, I started having an error in the dmesg when playing gzdoom with mods that load a lot of resources (like Project Brutality, but also Hexen Serpent Resurrection does that, albeit being a bit more difficult to trigger).
The error is a repetition of
redacted [drm:amdgpu_cs_parser_bos.isra.0 [amdgpu]] *ERROR* amdgpu_vm_validate() failed.redacted [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Not enough memory for command submission!
With the patch, the error is more easy to trigger.
I could play around with module options and see how things unfold. What are the good settings to try playing around with?
UDATE. With the following set of amdgpu module options, I was able to launch video playback without errors:
I noticed a small problem when compiling the kernel earlier. The compiler emitted a warning about an unused variable i. Probably, it would be elided during compilation, and we shouldn't see any observable effects.
I was able to run mpv with hwdec for over 10 hours today non-stop with the patched kernel. I adjusted slightly the config I presented in my earlier post, since it's more relevant to my use case (Chromium is configured to use Vulkan and VAAPI).
I have been monitoring the card's state with nvtop, and have observed consistent 300-500 MiB/s in TX and RX directions.
However, I'm still experiencing crashes when using 3d graphics intensively. In contrast to previous crashes, those are completely silent. I have an ssh session open with dmesg output being captured remotely. There's absolutely no single line being output. I'm pretty sure I would've observed the same even if I used serial console.
So, the only change in experience with 3d graphics, is that now there're no errors in the log about insufficient memory to submit a command, or inability to allocate contiguous 256mb block, or some sdma0 or sdma1 error. The log is silent.
I noticed that the crash usually occurs when TX in nvtop goes to a high value, like 19.80 GiB/s or 16.38 GiB/s. Well, this is not really beyond the capabilities of 16-lane PCIe-3.0 slot, but that certainly something that happens consistently at the time of crash with gzdoom.
Now, I also tried to play Hearts of Iron IV with this card, and crash occurred without the TX spiking.
@mahmoudshmaitelly What behavior do you observe if gpu-api=vulkan and hwdec=vaapi-copy? The mpv seems to be capricious when it comes to the choice of VO options, but those settings seem to make a lot of sense overall, and actually probably the only sound ones when Vulkan is desired as rendering API.
I wonder, if you may be able to test the same mpv command on X11? If that would make a difference, we'd be having a curious situation indeed.
On Debian, there's a small caveat with kernel module building and module signing, especially if you use dkms. Check out this post and also the procedure in the Debian's manual.
If you have 20-24 GB of extra RAM + Swap to spare, I would suggest to perform the actual build in the tmpfs (usually mounted on /tmp, but beware of maximum quota set there), and to perform an entire clean build.
I found time to test the patches on x86 and LoongArch, with a Radeon HD7850 (GCN 1.0) yesterday. The issue seems to have gone away on x86 with the video samples we use here at AOSC:
However, on LoongArch, the GPU driver seems to time out and reset whilst playing the AVC, 4K@60fps sample. The traceback changes between the two test trials, but the system locked up in both cases.
1st Run:
[ 435.347306] amdgpu 0000:07:00.0: amdgpu: Dumping IP State[ 435.352670] amdgpu 0000:07:00.0: amdgpu: Dumping IP State Completed[ 435.358900] amdgpu 0000:07:00.0: amdgpu: ring gfx timeout, signaled seq=30402, emitted seq=30404
2nd Run:
[ 145.842934] amdgpu 0000:07:00.0: amdgpu: Dumping IP State[ 145.848309] amdgpu 0000:07:00.0: amdgpu: Dumping IP State Completed[ 145.854603] amdgpu 0000:07:00.0: amdgpu: ring sdma0 timeout, signaled seq=2979, emitted seq=2982
However, on LoongArch, the GPU driver seems to time out and reset whilst playing the AVC, 4K@60fps sample. The traceback changes between the two test trials, but the system locked up in both cases.
In my case, such content plays just fine. Caveat: the software decoding is used by MPV, since HD 7750 does not have a hardware decoding capability for videos of such dimensions. I actually do some downscaling to play at 60fps, but 30fps the playback is fine at full resolution (and software decoding).
The patch helped to fix the problem for me on Debian, system Mesa Gallium driver 24.2.0-devel for AMD Radeon HD 7700 Series (radeonsi, verde, LLVM 18.1.7, DRM 3.59, 6.11.5-fix-amdgpu-01. The only caveat I noticed is that there's been a complaint by the compiler about unused variable `i`. I didn't fix it manually, since I wanted to keep the patch pristine.
This patch fixed the issue for me on x86, but causes GPU reset on LoongArch.
However, I do suspect that it is an issue outside of this driver (of course, I would recommend arranging more debugging down the line). See the thread here.
Which is really weird, since for two days there have been no problems whatsoever with video. And the same set of video files were actually being continuously playing today as well, earlier in the day.
IMO, this bug with amdgpu is just the tip of the iceberg. I'm on Fedora since Fedora 38, installed in october 2023 on this computer with an AMD A10-7850K Radeon R7 4C+8G (4) @ 3.700GHz CPU, and each and every linux kernel since 6.5.6 had some kind of crash/issue related to radeon/amdgpu happening almost on a daily basis, that I duly reported but were happily ignored and expired hoping the next Fedora release would be better (it hasn't). Curiously, kernel 6.8.7 is exempt of any of those bugs (nothing visible in dmesg or system logs). It was the original kernel shipped with the initial release of Fedora 40, so I stuck with it because every attempt to upgrade to the currently available kernel resulted in triggering one or another amdgpu issue. No idea how things were prior to 6.5.6, that computer used to run windows a year ago and 6.5.6 was the very first linux kernel to ever run on this computer.
Not wanting to direct any developer, just hoping this information will be useful in their efforts to get rid of amdgpu issues.
Upgrading motherboard is not an option. If I had to run obsolete OS, so be it. Not so big a deal.
I have another computer with an AMD Phenom II X6 1055T (6) @ 2.800GHz CPU and Radeon 3000 integrated GPU and never had any issue running Debian 9, 10, 11, and 12. For reference, Debian 11 was running kernel 5.10 which is still stable on Debian 12. Debian 12 was shipped with kernel 6.1 but I test the 6.9/6.10 backports as well. Only experienced the computer freezing twice with the 6.10 branch, so I now avoid it.
@i300220 It is worth noting that some of the older GPUs, like HD 7750 are still relevant, because they allow to output simultaneously six displays. This is an important features in video distribution setups, and multi-display workstations. Six to eight displays is not unheard of.
I compiled LTS kernels 5.4.286 and 4.19.324. The problem disappears in those kernels. In 5.4, there's still a lockup if dpm=1 is enabled. In 4.19, there's been more stability with dpm=1.
I'm comparing the kwizart 6.6.63-200.fc40.x86_64 kernel with the 6.8.7-300.fc40.x86_64 I've been running here on Fedora 40 and which is very stable and the sole problem at boot is below, but eventually, it fixes itself later on and the module loads properly.
systemd[1]: systemd-modules-load.service: Failed with result 'exit-code'.systemd[1]: Failed to start systemd-modules-load.service - Load Kernel Modules.systemd-modules-load[255]: Failed to insert module 'snd_pcm': Invalid argument
$ journalctl -b -u systemd-modules-loadnov 26 04:56:43 systemd-modules-load[268]: modprobe: FATAL: Module snd-seq not found in directory /lib/modules/6.6.63-200.fc40.x86_64
Please double check that you correctly applied the patch. When two people report back that the patch works it is really unlikely that it still breaks for you.
I applied the patch to 6.11.10-300.fc41.x86_64. Using AMD A10-7870K Radeon R7. Checked with multiple H264 videos in mpv using HW acceleration. Playback was successful, and no kernel errors were reported.