1bpp server performance regression
Please consider the following change that causes serious performance regression on 1bpp (monochrome) servers on tiling small patterns.
commit e572bcc7
fb: Remove even/odd tile slow-pathing
Again, clearly meant to be a fast path, but this turns out not to be the case.
Details
NetBSD still supports several monochrome framebuffers like Sun3 and Omron LUNA. After updates to Xorg 1.20.5 in the NetBSD tree I noticed extreme slowness on filling root_weave bitmap when screen saver was activated.
-
Xorg 1.10 based Xsun server on Sun3/60 bwtwo:
https://twitter.com/tsutsuii/status/1289451828036300800
-> Drawing time is not measurable by eyes. -
Xorg 1.20 based Xsun server on Sun3/60 bwtwo:
https://twitter.com/tsutsuii/status/1289437204654075907
-> It takes >10 seconds to fill root window. -
Xorg 1.18 based Xsun server on Sun3/60 bwtwo:
https://twitter.com/tsutsuii/status/1291000288862560256
-> Same as 1.20. -
Xorg 1.20 server + xf86-video-wsfb driver on LUNA using single plane:
https://twitter.com/tsutsuii/status/1291772031525179392
-> Also >10 seconds even on the xf86-video-wsfb driver.
With several investigation, it turns out the above changes to fb/fbtile.c cause this regression:
e572bcc7
I'm not sure how the "not to be the case" in the log was concluded, but the "fast path" of the removed fbEvenTile()
function was only called if FbEvenTile(tileWidth)
was true:
https://gitlab.freedesktop.org/xorg/xserver/-/blob/836bb27726441e048bb300664343a136bc596a5b/fb/fbtile.c#L145
void
fbTile(FbBits * dst,
FbStride dstStride,
int dstX,
int width,
int height,
FbBits * tile,
FbStride tileStride,
int tileWidth,
int tileHeight, int alu, FbBits pm, int bpp, int xRot, int yRot)
{
if (FbEvenTile(tileWidth))
fbEvenTile(dst, dstStride, dstX, width, height,
tile, tileStride, tileHeight, alu, pm, xRot, yRot);
FbEvenTile()
is defined in fb/fb.h:
https://gitlab.freedesktop.org/xorg/xserver/-/blob/e572bcc7f4236b7e0f23ab762f225b3bce37db59/fb/fb.h#L543
/*
* Accelerated tiles are power of 2 width <= FB_UNIT
*/
#define FbEvenTile(w) ((w) <= FB_UNIT && FbPowerOfTwo(w))
FB_UNIT
is 32 here, so the "fast path" is activiated only if tileWidth
arg is 32 or smaller (i.e. 1, 2, 4, 8, or 16).
The main caller of fbTile()
is fbFill()
with FillTiled
op in fb/fbfill.c:
https://gitlab.freedesktop.org/xorg/xserver/-/blob/7430fdb689678b98ac63f5a8dad13719bac777e0/fb/fbfill.c#L164
fbTile(dst + (y + dstYoff) * dstStride,
dstStride,
(x + dstXoff) * dstBpp,
width * dstBpp, height,
tile,
tileStride,
tileWidth * tileBpp,
tileHeight,
pGC->alu,
pPriv->pm,
dstBpp,
(pGC->patOrg.x + pDrawable->x + dstXoff) * dstBpp,
pGC->patOrg.y + pDrawable->y - y);
The argument tileWidth
of fbTile()
includes bpp, so the "fast path" fbEvenTile()
won't be called on 32bpp servers.
On the other hand, 1bpp server uses it for 32x32 or smaller bitmaps.
Reverting the above "fb: Remove even/odd tile slow-pathing" change significantly improves speed of filling the root_weave and other pattern of Xorg 1bpp server as before:
-
Patched Xsun server on Sun3/60 bwtwo:
https://twitter.com/tsutsuii/status/1291061688762957826 -
Patched Xorg + xf86-video-wsfb server on LUNA:
https://twitter.com/tsutsuii/status/1291773410964463617