Skip to content
  • Jehan's avatar
    src: drop less of UTF-8 confidence even with few non-multibyte chars. · bed459c6
    Jehan authored
    Some languages are not meant to have multibyte characters. For instance,
    English would typically have none. Yet you can still have UTF-8 English
    text (with a few special characters, or foreign words…). So anyway let's
    make it less of a deal breaker.
    
    To be even fairer, the whole logics is biased of course and I believe
    that eventually we should get rid of these lines of code dropping
    confidence on a number of character. This is a ridiculous rule (we base
    on our whole logics on language statistics and suddenly we add some
    weird rule with a completely random number). But for now, I'll keep this
    as-is until we make the whole library even more robust.
    bed459c6