pdftotext: combining accent moved to subsequent character

The attached PDF roundtrip.pdf displays two grapheme clusters: l̀a. This is the combined rendering of three Unicode characters: U+006C, U+0300, U+0061, in that order.

pdftotext (self-identified as "version 22.01.0") misinterprets the U+0300 as being applied to the 'a', not the 'l', and emits là (U+006C U+0061 U+0300).

Looking at the page stream, I see

BT
/F27 9.96264 Tf
1 0 0 1 34.098 32.557 Tm [<004f>]TJ
1 0 0 1 38.96 34.789 Tm [<0b3e>]TJ
1 0 0 1 37.187 32.557 Tm [<0044>]TJ
ET

004f, 0b3e, 0044 are mapped to U+006C, U+0300, U+0061 respectively by the /ToUnicode object for the font. I'm guessing pdftotext is only looking at the relative x-positions of the characters to decide where the accent character is placed (38.96 > 37.17) and not where the visible glyph for the accent actually winds up. I have no idea why this particular font places the origin of this character so far to the right of the visible glyph, but TeX managed to figure out how to place the accent correctly, so I would like to think it would be possible for pdftotext to do the same.

In case it is helpful, the PDF was generated by rendering the following document with LuaTeX from TeX Live 2021:

\documentclass{minimal}
\usepackage[paperwidth=5cm,paperheight=2cm,margin=5mm]{geometry}
\usepackage{fontspec}
\setmainfont{Noto Serif}
\pagestyle{empty}
\begin{document}

{\`l}a

\end{document}

Edited Jan 31, 2022 by Zack Weinberg

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information