Skip to content

Unify buffering for Streams

Tau requested to merge tau-dev/poppler:master into master

This patch adds centralized buffering and lookChar()/getChar() implementations, so that children only need to implement int getSomeChars(int nChars, unsigned char *buffer).

This simplifies implementations and allows removal of a ton of duplicated buffering logic everywhere. getChar() and lookChar() not requiring a virtual call anymore gives a big performance improvement on some tasks and allows us to drop additional buffering in users of the Stream such as Lexer.

To facilitate this, I removed the getRawChar() interfaces by making the StreamPredictor a FilterStream over the FlateStream/LZWStream. This seems more natural anyways, as it means we don't need a bunch of if (pred) return pred->getChar(); else return getRawChar(); dispatching anymore, but maybe I'm missing a subtle reason for the previous interface.

Some streams lend themselves more naturally to the new getSomeChars() interface, some just get a shim (implemented with a macro, not sure how haram those are in this project) over their character-oriented interface until the migration is complete. Even though the latter do not make perfect use of the new interface, I'm already getting a 15% performance improvement when pdftotexting a book. (Page::text() performance is also critical for applications like pdfgrep.)

This patch also fixes a wrong assumption about the offsets in ObjectStream::ObjectStream(), as well as bad limiting logic in EmbedStream::getChars(). Together with the better Stream interface, the latter fix finally allows efficient usage of zlib inflate() (of course drip-feeding a character at a time to the library is going to be slower than a hand-rolled implementation!), making for a total performance improvement of ~30%.

Merge request reports