Hardware assisted (VDPAU) decoding of MPEG-2 causes GPU lockup on Radeon HD6320
I am reaching out hoping to get some assistance with resolving a bug/crash that we see with the GPU when using VDPAU hardware acceleration on Ubuntu 16.04. This is specific to the r600 drivers interacting with VDPAU when trying to playback certain mpeg2 content.
GPU in question per lscpi:
00:01.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Wrestler [Radeon HD 6320]
We are highly invested in this GPU and would love to get this addressed as soon as possible and are also willing to sponsor this work if needed.
Steps to Recreate:
- Launch VLC with VDPAU hardware acceleration and deinterlacing enabled
- Play the attached piece of known bad content jurrasic_park_high_br_mpeg2.ts
- Wait for GPU lockup
- Per dmesg, the GPU thread gets locked up within the kernel and thus breaks all GUI related activities until the PC is rebooted.
Mesa Version Tested:
- 18.0.5-0ubuntu0~16.04.1
- 18.2.8-2~bpo9+1
We have 10,000 of these things in production and they have been playing hardware accelerated mpeg2 fine until we upgraded to Ubuntu 16.04 and the new mesa package. Our previous version of linux on these systems we were using an older software stack and video acceleration pipeline but it worked perfectly, so we know the hardware is capable.
Old Software Stack:
- vlc 2.1.5
- mesa 11.0.6
- va-api hardware acceleration
- Driver version: Splitted-Desktop Systems VDPAU backend for VA-API - 0.7.4
- Kernel: 3.14.79
New Software Stack:
- vlc 2.2.2
- mesa 18.2.8-2~bpo9+1
- vdpau hardware acceleration
- Kernel: Linux 4.15.0-55-generic
The reason we had to switch to VDPAU with Ubuntu 16.04 is that we saw a major regression with mpeg2 playback using va-api. The regressions consisted of dropped frames and choppy playback, more so on 1080i content that requires deinterlacing. It was capable of playing back mpeg4 without any issues however. Now that we have switched to VDPAU however, we are seeing this GPU thread lockup bug and thus causing X and other GUI related programs to crash and requiring a reboot to recover.
Changing out hardware for the next best thing is not an option at our scale and we know that the hardware is capable due to past experiences. We are just in need of assistance with someone or some party that knows that stack a lot more than us to help dig to the core issue of the lockup or help us get VA-API working for mpeg2 in 16.04.
With software decoding, the performance doesn't produce something that is watchable. One interesting tidbit to note. During our testing we put Ubuntu 19.10 on one of these boxes and noticed that full software acceleration has improved to the point that VA-API nor VDPAU was required for VLC to render the mpeg2 and mpeg4 streams correctly. Is this something that could potentially be backported to Ubuntu 16.04? I know this is a much bigger task that the one sentence ask alludes to, but figured I'd ask anyway.
Stacktrace
2019-12-02T21:52:20-0500 PLC-1a1684be-57f84738 kernel: radeon 0000:00:01.0: R_00D034_DMA_STATUS_REG = 0x44C83D57
2019-12-02T21:52:20-0500 PLC-1a1684be-57f84738 kernel: radeon 0000:00:01.0: R_008680_CP_STAT = 0x00000000
2019-12-02T21:52:20-0500 PLC-1a1684be-57f84738 kernel: radeon 0000:00:01.0: R_00867C_CP_BUSY_STAT = 0x00000000
2019-12-02T21:52:20-0500 PLC-1a1684be-57f84738 kernel: radeon 0000:00:01.0: R_008678_CP_STALLED_STAT2 = 0x00000000
2019-12-02T21:52:20-0500 PLC-1a1684be-57f84738 kernel: radeon 0000:00:01.0: R_008674_CP_STALLED_STAT1 = 0x00000000
2019-12-02T21:52:20-0500 PLC-1a1684be-57f84738 kernel: radeon 0000:00:01.0: SRBM_STATUS2 = 0x00000000
2019-12-02T21:52:20-0500 PLC-1a1684be-57f84738 kernel: radeon 0000:00:01.0: SRBM_STATUS = 0x20000040
2019-12-02T21:52:20-0500 PLC-1a1684be-57f84738 kernel: radeon 0000:00:01.0: GRBM_STATUS_SE1 = 0x00000007
2019-12-02T21:52:20-0500 PLC-1a1684be-57f84738 kernel: radeon 0000:00:01.0: GRBM_STATUS_SE0 = 0x00000007
2019-12-02T21:52:20-0500 PLC-1a1684be-57f84738 kernel: radeon 0000:00:01.0: GRBM_STATUS = 0x00003828
2019-12-02T21:52:20-0500 PLC-1a1684be-57f84738 kernel: radeon 0000:00:01.0: SRBM_SOFT_RESET=0x00008100
2019-12-02T21:52:20-0500 PLC-1a1684be-57f84738 kernel: radeon 0000:00:01.0: GRBM_SOFT_RESET=0x00007F6B
2019-12-02T21:52:20-0500 PLC-1a1684be-57f84738 kernel: radeon 0000:00:01.0: R_00D034_DMA_STATUS_REG = 0x44C83D57
2019-12-02T21:52:20-0500 PLC-1a1684be-57f84738 kernel: radeon 0000:00:01.0: R_008680_CP_STAT = 0x80878647
2019-12-02T21:52:20-0500 PLC-1a1684be-57f84738 kernel: radeon 0000:00:01.0: R_00867C_CP_BUSY_STAT = 0x00068406
2019-12-02T21:52:20-0500 PLC-1a1684be-57f84738 kernel: radeon 0000:00:01.0: R_008678_CP_STALLED_STAT2 = 0x00011000
2019-12-02T21:52:20-0500 PLC-1a1684be-57f84738 kernel: radeon 0000:00:01.0: R_008674_CP_STALLED_STAT1 = 0x01000000
2019-12-02T21:52:20-0500 PLC-1a1684be-57f84738 kernel: radeon 0000:00:01.0: SRBM_STATUS2 = 0x00000000
2019-12-02T21:52:20-0500 PLC-1a1684be-57f84738 kernel: radeon 0000:00:01.0: SRBM_STATUS = 0x20004840
2019-12-02T21:52:20-0500 PLC-1a1684be-57f84738 kernel: radeon 0000:00:01.0: GRBM_STATUS_SE1 = 0x00000007
2019-12-02T21:52:20-0500 PLC-1a1684be-57f84738 kernel: radeon 0000:00:01.0: GRBM_STATUS_SE0 = 0x1C000007
2019-12-02T21:52:20-0500 PLC-1a1684be-57f84738 kernel: radeon 0000:00:01.0: GRBM_STATUS = 0xA2703CA0
2019-12-02T21:52:20-0500 PLC-1a1684be-57f84738 kernel: radeon 0000:00:01.0: GPU softreset: 0x00000099
2019-12-02T21:52:20-0500 PLC-1a1684be-57f84738 kernel: radeon 0000:00:01.0: Saved 647 dwords of commands on ring 0.
2019-12-02T21:52:20-0500 PLC-1a1684be-57f84738 kernel: radeon 0000:00:01.0: GPU lockup (current fence id 0x0000000000fa37b9 last fence id 0x0000000000fa37cc
2019-12-02T21:52:20-0500 PLC-1a1684be-57f84738 kernel: radeon 0000:00:01.0: ring 0 stalled for more than 10240msec```
Edited:
Added Kernel for old and new setup.