Performance issue with version 0.0.8 versus 0.0.7
Version 0.0.8 (built from source) is orders of magnitude slower than version 0.0.7. I note that version 0.0.8 now does language detection in addition to the encoding; is this slowdown a consequence of that feature?
I was considering updating our version, but the slower performance makes that prohibitive. Is there anything that can be done to improve the performance and make it more comparable to that of version 0.0.7?
Here's some quantitative evidence.
OS version:
$ uchardet --version
uchardet Command Line Tool
Version 0.0.7
Authors: BYVoid, Jehan
Bug Report: https://gitlab.freedesktop.org/uchardet/uchardet/-/issues
Latest version built from source:
$ ./src/tools/uchardet --version
uchardet Command Line Tool
Version 0.0.8
Authors: BYVoid, Jehan
Bug Report: https://gitlab.freedesktop.org/uchardet/uchardet/-/issues
Reasonably large files to test:
$ ls -sh *.csv
76M data.csv
76M utf8.data.csv
Time to run for 0.0.7:
$ time uchardet *.csv
data.csv: ISO-8859-15
utf8.data.csv: UTF-8
real 0m0.427s
user 0m0.419s
sys 0m0.009s
Time to run for 0.0.8:
$ time ./src/tools/uchardet *.csv
data.csv: ISO-8859-15
utf8.data.csv: UTF-8
real 2m29.847s
user 2m29.466s
sys 0m0.093s