Tags

Tags give the ability to mark specific points in history as being important

v0.0.7

59f68dbe · Release: version 0.0.7 · Apr 23, 2020

Version 0.0.7 released.

- New supports:
  * Croatian - ISO-8859-2, ISO-8859-13, ISO-8859-16, IBM852,
    Windows-1250 and MAC-CENTRALEUROPE
  * Czech - Windows-1250, ISO-8859-2, IBM852 and Mac-CentralEurope.
  * Estonian - ISO-8859-4, ISO-8859-13, ISO-8859-13, Windows-1252 and
    Windows-1257.
  * Finnish - ISO-8859-1, ISO-8859-4, ISO-8859-9, ISO-8859-13,
    ISO-8859-15 and WINDOWS-1252.
  * Irish Gaelic - ISO-8859-1, ISO-8859-9, ISO-8859-15 and WINDOWS-1252.
  * Italian - ISO-8859-1, ISO-8859-3, ISO-8859-9, ISO-8859-15 and
    WINDOWS-1252.
  * Latvian - ISO-8859-4, ISO-8859-10 and ISO-8859-13
  * Lithuanian - ISO-8859-4, IISO-8859-10 and SO-8859-13
  * Maltese - ISO-8859-3.
  * Polish - ISO-8859-2, ISO-8859-13, ISO-8859-16, Windows-1250, IBM852,
    MAC-CENTRALEUROPE.
  * Portuguese - ISO-8859-1, ISO-8859-9, ISO-8859-15 and Windows-1252.
  * Romanian - ISO-8859-2, ISO-8859-16, Windows-1250 and IBM852.
  * Slovak - Windows-1250, ISO-8859-2, IBM852 and Mac-CentralEurope.
  * Slovene - ISO-8859-2, ISO-8859-16, Windows-1250, IBM852 and
    MAC-CENTRALEUROPE.
  * Swedish - ISO-8859-1, ISO-8859-4, ISO-8859-9, ISO-8859-15 and
    WINDOWS-1252.
- EUC-KR now returned as UHC: despite differences, mostly upward
  compatible. Let's return UHC until separate detection machines are
  implemented.
- `uchardet` CLI tool now supports the end-of-options `--` option to be
  able to process any file starting with a dash.
- uchardet_handle_data() API considers an empty string input as
  successful processing to improve automatic detection from random
  input sources.
- Repository code now requires C++1 standard.
- Better Windows DLL support for non-GNU compilers.
- Gitlab continuous integration for GCC, CLang on Debian, and Windows 32
  and 64-bit with Mingw-w64.
- Several bugs fixed.

v0.0.6

8a8d6b65 · Release: version 0.0.6. · Jul 20, 2016

Version 0.0.6 released.

- Improve ASCII and ISO-8859-1 detection.
- Improve language models: Greek, Hungarian.
- New supports:
  * Arabic - ISO-8859-6 and Windows-1256.
  * Danish - Windows-1252, ISO-8859-1 and ISO-8859-15.
  * Spanish - ISO-8859-1, ISO-8859-15 and Windows-1252.
  * Vietnamese - VISCII and Windows-1258.
- Improve single-byte encoding detection algorithm by giving more weight
  to "probable" sequences (less frequent than "positive" sequence, yet
  not "negative").
- `uchardet` command line tool improved:
  * exits with non-zero return values on error.
- CMake build improved with more options:
  * Binary can be installed to non-default dir.
  * Allow building static-only builds.
  * Allow not building the command line tool.
  * Add static lib destination.

v0.0.5

886e03a5 · Release: version 0.0.5. · Dec 04, 2015

Version 0.0.5 released.

- Revert UTF-16 and UTF-32 label change:
  it was an error to specify endianness for texts with BOM.
  The Unicode standard explicitly warns against it, and it actually
  even (partially) break conversions.
- Added supports:
    - French: Windows-1252.
    - German: ISO-8859-1, Windows-1252
    - Esperanto: ISO-8859-3
    - Turkish: ISO-8859-3 and ISO-8859-9
    - Thai: ISO-8859-11 (and TIS-620 model rebuilt).
- Single Byte charset detection algorithm improved:
  detection of control characters lowers confidence.

v0.0.4

e4260f4a · Release: version 0.0.4. · Dec 03, 2015

Version 0.0.4 released.

- Add support of ISO-8859-1 and ISO-8859-15 for French.
- Re-enable Hungarian language models (ISO-8859-2 and Windows-1250)
  which used to conflict with other charsets (should be better now).
- Differentiate ASCII detection and detection failure.
- Improve single-byte charset detection confidence algorithm (fixes for
  instance Windows-1251 Russian text detection).
- "UTF-16" is now outputted with endianness information (UTF-16LE/BE).
- Add UTF-32 BOM detection.
- Discard single byte charsets upon illegal codepoint detection.
- Internal redesign of single-byte charmaps with more semantics, and
  variable sample size length (different languages have different sizes
  of grapheme lists).
- A lot more test files (33 successful unit tests should be successful
  with `make test`).
- Adding python scripts to generate language models from Wikipedia data
  in a single command.

v0.0.3

ff5fd5ef · Release: version 0.0.3. · Nov 19, 2015

Version 0.0.3 Released.

A quick release after 0.0.2 mostly to fix a bad crash on the command
line tool when charset detection failed (or detected ASCII).

Additionaly:

- The build now includes more test files for various language/encoding
  and a `make test` target for unit testing (20 encoding detection tests
  should be successful upon running it).
- The build has a new BUILD_STATIC option, by default set to ON,
  allowing to disable static library building if not needed.
- All encoding names are iconv-compatible, enabling developers to
  directly feed the result of uchardet_get_charset() into libiconv.
- Compilation warnings fixed.

v0.0.2

d0ccdd5d · Release: version 0.0.2. · Nov 16, 2015
```
Version 0.0.2 released.
```

Admin message

Tags