r600_dri crash
System information
System: Host: XXXX Kernel: 5.15.12-200.fc35.x86_64 x86_64 bits: 64 compiler: gcc v: 2.37-10.fc35 Desktop: MWM wm: FVWM dm: GDM Distro: Fedora release 35 (Thirty Five)
CPU: Info: Quad Core model: AMD A10-4655M APU with Radeon HD Graphics bits: 64 type: MCP arch: Piledriver rev: 1 cache: L1: 320 KiB L2: 8 MiB flags: avx ht lm nx pae sse sse2 sse3 sse4_1 sse4_2 sse4a ssse3 svm bogomips: 15970 Speed: 1532 MHz min/max: 1400/2000 MHz boost: enabled Core speeds (MHz): 1: 1397 2: 1396 3: 1400 4: 1413
Graphics: Device-1: AMD Trinity [Radeon HD 7620G] vendor: Biostar Microtech Intl Corp driver: radeon v: kernel bus-ID: 00:01.0 chip-ID: 1002:9907 Display: x11 server: X.Org 1.20.14 driver: loaded: ati,radeon unloaded: fbdev,modesetting,vesa resolution: 1920x1080~60Hz s-dpi: 96
OpenGL: renderer: AMD ARUBA (DRM 2.50.0 / 5.15.12-200.fc35.x86_64 LLVM 13.0.0) v: 4.3 Mesa 21.3.3 compat-v: 3.1 direct render: Yes
Problem exists with Fedora 34 (after an update), Fedora 35 and git main.
00:01.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Trinity [Radeon HD 7620G] [1002:9907]
OpenGL version string: 3.1 Mesa 21.3.3
X.Org X Server 1.20.14 with fvwm
Describe the issue
For about half a year, I've been getting random crashes of Xorg, which seem to be coming from r600_dri.so. I also had the same issue with Xwayland. Unfortunately, the traceback information was useless and it was hard to reproduce reliably. Sometimes it happened several times a day, sometimes several days would go by without problems. Each time, I was dumped back to the login window. However, I recently discovered a way to reproduce the problem consistently, by waggling the scrollbar on a TBrowser from CERN's root package. So finally, I was able to git bisect.
That put the blame on commit 3c5b7dca (util/vbuf: fix buffer overrun in attribute conversions).
Looking at this commit, I see that it changes the meaning of size so that it now includes the offset (which can be large). However, the next unchanged lines check if offset + size is within the width of the buffer. So now offset is being included double. This leads to spurious triggering of the "fixing up" of num_vertices, which can increase this value.
For example, with a bit more debugging and looking at the values, I see start_vector = 0, vb->stride = 6, num_vertices = 8, buffer_offset = 481452, so offset is also 481452 and size is 48. The added code then calculates last_offset as 481452+48-6 = 481494, which is larger than 48, so size is changed to this value+32. This means that offset+size is now 962978, which is larger than vb->buffer.resource->width0 (512K), so size gets changed again to 512K-481452 = 42836 and then num_vertices gets changed to 7140 and down goes Xorg.
Reverting this commit fixes the problem for me, but I guess the commit was intended to solve another problem. I can't quite figure out what was intended, but I don't think this commit is correct.