UTF-8 Italian text recognized as ISO-8859-1 Portuguese
Submitted by Jehan Pagès
Assigned to Jehan Pagès
Created attachment 133604 UTF-8 text.
The attached text is UTF-8 Italian, but since commit e138839f (Portuguese support for ISO-8859-1), this text is recognized as ISO-8859-1.
Not sure though if there is a proper solution apart from removing Portuguese support on short-term and adding actual language detection to UTF-8, longer term (see bug 101218).
Also obviously the fact that the file just holds 2 words make it a difficult guess for a system based on statistics.
Attachment 133604, "UTF-8 text.":