Skip to content

[cid] Avoid n^2 scanning for binary data

Ben Wagner requested to merge bungeman/freetype:cid_parser_new_timeout into master

When creating a CID parser the location of the StartData or /sfnts tokens needs to be known. However, the token parser requires that the entire document be in memory and flattening the entire stream into memory is to be avoided.

To avoid forcing the entire stream into memory, previously this code would scan through the stream looking for "StartData" or "/sfnts" as strings. However, these strings could have been in a comment or string token, so the stream would be read into memory up to that point and the parser run to check that these strings were actually tokens. However, this forced a parser restart from the beginning each time. As a result, data with many "StartData" non-tokens would take n^2 time to check.

This change instead has the initial scan look for the last possible "StartData" or "/sfnts" string in the stream. The stream is read forward instead of backward as a typical normal CID font will have one "StartData" toward the beginning of the data and it it much faster to read the data from beginning to end instead of end to beginning. For memory based fonts the limit is set to the end of the stream since the stream is already in memory. Then the parser is run once to look for "StartData" or "/sfnts" tokens. If they are found the parser is re-set to reflect this new information.

Bug: https://issues.chromium.org/issues/40201695

Merge request reports