Skip to content

fix for issue #39 (gb18030 encoding test)

Pedro López-Cabanillas requested to merge plcl/uchardet:devel into master

The gb18030 test fails, reporting the sample text as Macedonian language encoded with windows-1251. This is because 1: the Macedonian language model is very optimistic and reports high confidence with the given sample, and 2: the original sample text is extremely short and lacks language variety.

By simply adding a good amount of real Chinese literature to the sample file, the test no longer fails.

This text has been extracted from Wikipedia: https://zh.wikipedia.org/wiki/%E4%B8%AD%E5%8D%8E%E4%BA%BA%E6%B0%91%E5%85%B1%E5%92%8C%E5%9B%BD

Merge request reports