Optimize family name matching in sort and match functions

Submitted by Karl Tomlinson

Assigned to fon..@..op.org

Link to original bug (#19376)

Description

Currently the time for a FcFont(Set)Sort with trim = false (or for a FcFont(Set)Match) I assume) is typically dominated by comparing family names.

By the time aliases are added to the sort pattern there are typically 80-100 family names in the pattern (p). For 300 to 500 fonts (some of which will have family names for each language) (n), that is p * n ~ 40,000 case-insensitive string comparisons.

I think a sensible approach to optimizing this would be to sort the families in the sort pattern (removing duplicates, recording the score for each family, and remembering to consider the binding) so that each family lookup is O(log p) and the total complexity is O((n + p) log p).

This would be faster whenever n >> log p, which is almost always the case.

(Behdad was considering something similar. I think this is slightly different from what Behdad had in mind, but perhaps he thought of this too.)

(I guess the same argument could be applied to any (sortable) property, but it is typically only family properties that have so many values.)

(The sorted family list is essentially a cheap hash table. I'm guessing that looking up a family in a sorted list is going to be cheaper ~O(log p), assuming there are not too many similar family names), than generating a hash for the family O(length of string).)

Version: 2_1

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information