Due to an influx of spam, we have had to impose restrictions on new accounts. Please see this wiki page for instructions on how to get full permissions. Sorry for the inconvenience.
Admin message
Welcome to our new datacenter. The migration is still not over, but we try to bring up the service to the best we can. There are some parts not working yet (shared runners, previous job logs, previous job artifacts, ... ) but we try to do our best.
We do not guarantee data while the migration is not over, please consider this as read-only
I've tried my AMD Radeon RX 5700 XT on both ubuntu (llvm 9 / mesa 19.3 - Oibaf PPA) and Manjaro (llvm 10 git / mesa-git).
On both I've been using Gnome shell and in both cases I had frequent lockups and freezes. Once my GPU disconnected to Monitor and remained so until I rebooted, other times desktop would just freeze and crash the whole system.
Error log:
avg 24 22:53:58 Marko-PC kernel: [drm:amdgpu_dm_commit_planes.constprop.0 [amdgpu]] ERROR Waiting for fences timed out or interrupted!
avg 24 22:53:58 Marko-PC kernel: [drm:amdgpu_job_timedout [amdgpu]] ERROR ring gfx_0.0.0 timeout, signaled seq=94235, emitted seq=94237
avg 24 22:53:58 Marko-PC kernel: [drm:amdgpu_job_timedout [amdgpu]] ERROR Process information: process citra-qt pid 27356 thread citra-qt:cs0 pid 27366
Happened on all setups, bug was pretty much the same, lockups weren't extremely frequent but frequent enough that they were very noticable (5-6 freezes per day on average)
Faulty hardware is probably out of options since I never had a hiccup or anything even close to crash or freeze on my Windows desktop.
Designs
Child items
...
Show closed items
Linked items
0
Link issues together to show that they're related.
Learn more.
Pretty much same-type error happens in different situations and very often at random while using the desktop. These 2 logs one is from launching an OpenGL from Citra emulator which is reproducable every time and the second one from Manjaro is while browsing the Gnome shell and it would crash without any clear triggers.
I confirm that I have this bug or a very similar one.
It, for some reason, happens most when i'm using my IDE (Intellij based).
It will append the most when I type code and the crash occur when the IDE is supposed to propose some code completion.
I do have one to two crash a day.
Video card is RX5700
CPU is Ryzen R7-2700X
Software tested LLVM 9 git
libdrm, mesa, ddx updated from GIT very frequently.
Bug is there since I have the card, like 3 weeks ago.
I don't know if i'm encountering the same bug, but it is at least similar.
I don't get hard freezes/lockups, but i get a strange "stutterting", as if the whole OS halted for a few seconds, then continued for a few seconds...and the halted times grew while the "usable seconds" got shorter quickly to the point of unusability...
It doesn't happen regularly (seems like anything between 30min and 120min) and i haven't yet made out a direct cause, but in journalctl, it seems the same messages appear every time when it begins:
Maybe some observation that might help to narrow it down:
The first time it occured, i had to do a few reboots that showed this behaviour right after startup until it finally worked again - for about 45min.
As it didn't work again after around 10 reboots, i tried uninstalling corectrl (that i used to have a custom fan-curve) - and it finally booted normal again!
I then installed radeon-profile to have fan-controll (i don't want to have the fans stand still on desktop, as the card gets over 80° C hot before the fans kick in...).
The issue still occurs with radeon-profile, but at least every reboot is running fine...
Other thing i noticed is that after the first "freeze" with radeon-profile lm_sensors stopped reporting the fanspeed for the card, it always stays at zero.
So maybe it is related to fan-control or the sysfs interface in general?
It probably really depend of what we do on our desktop. I just remember now how I did stop using FileZilla since I got that GPU as it was crashing almost all the time I was using it (Like I never not crashed while that thing was open and running). Still use it for work but I keep it to minimum (open, upload, close) instead of keeping it running.
Ok, I did look at the recent kernel patch and commit and they seam to have fixed a couple bugs. I do not know it it include these but I did not crash one time since I merged that into the kernel 5.3-rc6. (that code is staged for 5.4 merge window).
I did attach the patch so you can merge that if you wish to try. It add all the latest bits for AMDGPU into 5.3-rc6, including Renoir support.
Created attachment 145225 [details] [review]
Merge last adg5f code
Ok, I did look at the recent kernel patch and commit and they seam to have
fixed a couple bugs. I do not know it it include these but I did not crash
one time since I merged that into the kernel 5.3-rc6. (that code is staged
for 5.4 merge window).
I did attach the patch so you can merge that if you wish to try. It add all
the latest bits for AMDGPU into 5.3-rc6, including Renoir support.
How do I merge the patch myself? :) I'd like to try it
On my side i can report that the issue does not occur if i don't use a tool to modify the FANs - does anyone of you use something of the like or are this seperate issues?
On my side i can report that the issue does not occur if i don't use a tool
to modify the FANs - does anyone of you use something of the like or are
this seperate issues?
I don't use any tools, all is stock.
(In reply to Mathieu Belanger from comment 7)
> Created attachment 145225 [details] [review]
> Merge last adg5f code
>
> Ok, I did look at the recent kernel patch and commit and they seam to have
> fixed a couple bugs. I do not know it it include these but I did not crash
> one time since I merged that into the kernel 5.3-rc6. (that code is staged
> for 5.4 merge window).
>
> I did attach the patch so you can merge that if you wish to try. It add all
> the latest bits for AMDGPU into 5.3-rc6, including Renoir support.
After applying the patch, same type of error occurs, luckily very easy to reproduce with Citra emulator, apparently it does something that AMD's driver really doesn't like and makes chances higher for error to occur. Also when CPU is under heavy I/O load error seems more likely to occur as well on my end.
Last log after applying the latest patch from the merge posted in the attachment:
sep 01 02:29:10 Marko-PC kernel: [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out!
sep 01 02:29:10 Marko-PC kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, signaled seq=16312, emitted seq=16314
sep 01 02:29:10 Marko-PC kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process citra-qt pid 2928 thread citra-qt:cs0 pid 2938
sep 01 02:29:10 Marko-PC kernel: [drm] GPU recovery disabled.
If we could get any official AMD responses to at least make sure that we're at least being listened to would be very nice.
On my side i can report that the issue does not occur if i don't use a tool
to modify the FANs - does anyone of you use something of the like or are
this seperate issues?
I don't use any tools, all is stock.
(In reply to Mathieu Belanger from comment 7)
> Created attachment 145225 [details] [review] [review]
> Merge last adg5f code
>
> Ok, I did look at the recent kernel patch and commit and they seam to have
> fixed a couple bugs. I do not know it it include these but I did not crash
> one time since I merged that into the kernel 5.3-rc6. (that code is staged
> for 5.4 merge window).
>
> I did attach the patch so you can merge that if you wish to try. It add all
> the latest bits for AMDGPU into 5.3-rc6, including Renoir support.
After applying the patch, same type of error occurs, luckily very easy to
reproduce with Citra emulator, apparently it does something that AMD's
driver really doesn't like and makes chances higher for error to occur. Also
when CPU is under heavy I/O load error seems more likely to occur as well on
my end.
Last log after applying the latest patch from the merge posted in the
attachment:
sep 01 02:29:10 Marko-PC kernel: [drm:amdgpu_dm_atomic_commit_tail [amdgpu]]
*ERROR* Waiting for fences timed out!
sep 01 02:29:10 Marko-PC kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR*
ring gfx_0.0.0 timeout, signaled seq=16312, emitted seq=16314
sep 01 02:29:10 Marko-PC kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR*
Process information: process citra-qt pid 2928 thread citra-qt:cs0 pid 2938
sep 01 02:29:10 Marko-PC kernel: [drm] GPU recovery disabled.
If we could get any official AMD responses to at least make sure that we're
at least being listened to would be very nice.
I was able to reproduce that Citra crash.
Followed the instruction, it did crash instantly after choosing continue (or a fraction of a second after, the music lagged a lil and complete system crash (was able so sync/umount/reboot with the magics key)).
Is your crash exactly at the same place? If so then it's very reproducible and it might be a good idea to run a opengl trace to see what commands was sent last to provoke the crash.
I am not familiar with the Ubuntu stuff, is these got compiled on your system? if no do you know the build date of your Mesa, libdrm and xf86-video-amdgpu (x11 ddx).
Also can you tell what microcode files dates you do have?
The microcode files where not available on my distribution when I installed them. I did download/install them on August 6 but they where from July 15 ish I think, I remember that the latest microcode at that time where crashing with a black screen on module load and that's why I did install an older version.
I was able to reproduce that Citra crash.
Followed the instruction, it did crash instantly after choosing continue (or
a fraction of a second after, the music lagged a lil and complete system
crash (was able so sync/umount/reboot with the magics key)).
Is your crash exactly at the same place? If so then it's very reproducible
and it might be a good idea to run a opengl trace to see what commands was
sent last to provoke the crash.
I am not familiar with the Ubuntu stuff, is these got compiled on your
system? if no do you know the build date of your Mesa, libdrm and
xf86-video-amdgpu (x11 ddx).
Also can you tell what microcode files dates you do have?
The microcode files where not available on my distribution when I installed
them. I did download/install them on August 6 but they where from July 15
ish I think, I remember that the latest microcode at that time where
crashing with a black screen on module load and that's why I did install an
older version.
Yes, always happens at the same place with Citra emulator, however what bothers me more about the bug is that sometimes it happens completely randomly on my system without any really obvious triggers while just browsing and using my desktop, so it's not Citra exclusive, but luckily I've found the Citra method to provode the bug so we can do more detailed logging.
Further observations:
- Bug is the same-type as other crashes and is not Citra emulator exclusive, happens on Rocket League on launch as well and sometimes randomly while using the desktop
- Same type of crash IS NOT reproducable on Windows on the same GPU
- Same type of bug IS NOT reproducable on my IntelHD laptop with same versions of MESA/LLVM which probably means either faulty AMD kernel driver or faulty Firmware binaries.
My versions are:
MESA: Mesa 19.3.0-devel (git-6775a52 2019-09-02 eoan-oibaf-ppa)
Kernel: Ubuntu mainline 5.3 daily build (I ALSO tried amd-drm-next-5.4, same bug is reproducable)
Firmware binaries: 2019-08-26 from /~agd5f/radeon_ucode/navi10