GPU lockup Polaris 11 - AMD RX 460 and RX 550 on amd64 and on ARMv7 platforms while playing video
Submitted by Luis Mendes
Assigned to Default DRI bug account
Created attachment 136527 dmesg and iomem data from lockup obtained with glretrace
I am getting GPU lockups while playing video on Kodi, but it also happened with other applications that play video, while OpenGL seems to be stable. The system seem to be more sensitive to VP9 encoded videos. The freeze happens both on amd64 as well as on armv7l platforms. I am also able to reproduce GPU hangs on amd64 while replaying a glretrace obtained with kodi on arm platform.
The arm dmesg and traces show a clear GPU lockup, while amd64 dmesg isn't so clear, but the user experience is just the same, complete graphical system freeze, while machine is still working with ssh or remote connections.
Please find amd64 logs in attachments, including iomem, dmesg and gdb traces.
In both platforms I am using Ubuntu 17.10 with Mate desktop, and lightdm session manager, with libdrm-2.4.89, mesa-17.4 at commit "radv: Implement binning on GFX9." - 6a36bfc6 and kernel https://cgit.freedesktop.org/~agd5f/linux/log/?h=drm-next-4.16 at commit "drm/amdgpu: Correct the IB size of bo update mapping." - 104bd2ca1124dfd9aa904d5f5a96253ef2b580f6.
Please note that the system was more stable a few weeks ago with drm-next-4.16 based on kernel 4.15-rc2, and a previous mesa version, I don't remember the actual commits, but despite it was more stable, both on arm as well as on amd64, both systems still crashed similarly, it just got more evident with these new versions.
There are two distinct crash behaviours on amd64: the ones that I obtained while playing a video with kodi on amd64 and those that I obtained on amd64 by replaying an apitrace from the arm platform while playing a VP9 video with kodi.
The first kind of crashes is detailed with logs kodi-processes_and_backtraces.txt and kodi-amdgpu_lockup_dmesg_and_iomem.txt. The second kind of crashes is detailed with logs glretrace-processes_and_backtraces.txt and glretrace-amdgpu_lockup_dmesg_and_iomem.txt.
For some strange reason the amd64 platform is complaining about polaris11 firmware files, but they are in /lib/firmware and they taken by cloning https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git. I am using the same firmware files on armv7l and the same graphics card and it doesn't complain with the firmware.
I can also provide the apitrace trace file, but it takes around 1GB of data.
Attachment 136527, "dmesg and iomem data from lockup obtained with glretrace":