UTF-8 section symbol (0xC2A7) invokes TIS-620 decoding
Submitted by pok..@..il.com
Assigned to Jehan Pagès @Jehan
Link to original bug (#101310)
Description
Created attachment 131730 File containing a single section sign in the midst of other text
A single occurrence of the section sign (§) encoded in UTF-8 causes the file to be marked as TIS-620, even if the rest of the text is English. This can be seen with the attached file (which includes a single §); curiously adding more instances of § elsewhere usually causes the file to be correctly detected as UTF-8.
This may be a duplicate of bug 101218, but it's a more specific case. This was first reported at https://github.com/notepad-plus-plus/notepad-plus-plus/issues/940, but I've narrowed it down to a bug in uchardet.
Attachment 131730, "File containing a single section sign in the midst of other text":
UTF-8_with_section_sign.txt