Skip to content

Draft: [smooth] Dynamic memory allocation.

Alexei Podtelezhnikov requested to merge smooth_malloc into master

FreeType smooth renderer uses a linked list of pixels (aka cells) visited by the outline path to integrate the coverage. The storage for this sparse matrix used to be allocated on the stack. This is a prototype that replaces the stack allocation with the dynamic heap allocation. Below are the rendering times with the old stack and new heap allocations. Traditionally, the size of stack allocation is fixed but large enough for small simple glyphs (e.g., Palatino A,V,O,X with 2 vertical stems). The rendering is split in bands at larger sizes. For more complex glyphs (Kunstler or Cabin Sketch), when the stack is too small and the bands are too large, the bands have to be subdivided and rendering restarted (longjmp), which hurts the performance.

Stack (µs/op) 10 pp 30 pp 100 pp
Palatino 4.5 7.4 17.9
Kunstler 4.6 8.2 24.0
CabinSketch-Bold 16.5 31.1 138.0

In comparison, the dynamic heap allocations are estimated based on the taxicab perimeter, which approximates the number of visited cells. It has measurable associated costs for both malloc and taxicab. The benefit for complex shapes at larger sizes is huge, however.

Heap (µs/op) 10 pp 30 pp 100 pp
Palatino 4.8 7.9 17.0
Kunstler 4.9 8.3 18.3
CabinSketch-Bold 18.0 32.0 80.0

The following table contains the heap memory use for the tenth worst glyph on 64-bit platform. It should be proportional to the glyph size. The traditional stack was always fixed at 16 kb. Therefore, it is not surprising that the heap speed is better when the stack is short. On the other hand, for the small sizes, the necessary memory use is a lot smaller than 16 kb.

Heap (10th kb) 10 pp 30 pp 100 pp
Palatino 2.3 6.9 22.7
Kunstler 2.9 8.7 29.3
CabinSketch-Bold 5.6 16.6 55.6

By the way, alternative renderers (fontdue, font-rs) often use large complex glyphs to showcase their advantages.

Edited by Alexei Podtelezhnikov

Merge request reports