1. 27 Jan, 2021 1 commit
  2. 29 Apr, 2020 1 commit
    • Jehan's avatar
      Issue #17: update README. · c8a3572c
      Jehan authored
      Replace the old link to the science paper by one on archive-mozilla
      website. Remove the original source link as I can't find any archived
      version of it (even on archive.org, only the folder structure is saved,
      not actual files themselves, so it's useless).
      
      Also add some history, which is probably a nice touch.
      
      Add a link to crossroad to help people who'd want to cross-compile
      uchardet.
      
      Finally add the R binding by Artem Klevtsov and QtAV as reported.
      c8a3572c
  3. 28 Apr, 2020 1 commit
  4. 26 Apr, 2020 2 commits
    • myd7349's avatar
      8681fc06
    • myd7349's avatar
      build: Fix build errors on Windows · 5bcbd23a
      myd7349 authored
      - Fix string no output variables on UWP
      
        On UWP, CMAKE_SYSTEM_PROCESSOR may be empty. As a result:
        string(TOLOWER ${CMAKE_SYSTEM_PROCESSOR} TARGET_ARCHITECTURE)
        will be treated as:
        string(TOLOWER TARGET_ARCHITECTURE)
        which, as a result, will cause a CMake error:
      
        CMake Error at CMakeLists.txt:42 (string):
          string no output variable specified
      
      - Remove unnecessary header inclusions in uchardet.cpp
      
        These extra inclusions cause build errors on Windows.
      5bcbd23a
  5. 23 Apr, 2020 2 commits
  6. 22 Apr, 2020 11 commits
  7. 21 Apr, 2020 2 commits
  8. 26 Sep, 2018 1 commit
  9. 21 Jan, 2018 1 commit
  10. 26 Dec, 2017 1 commit
  11. 24 Dec, 2017 1 commit
    • Jehan's avatar
      CMake: get rid of some commented code. · df67ae4f
      Jehan authored
      It says that's for Win32 platform and uses the install prefix as library
      prefix. But that's not at all the same kind of prefixes!
      CMAKE_INSTALL_PREFIX expected value is the path to install the lib (what
      is called the "installation prefix"), whereas CMAKE_*_LIBRARY_PREFIX are
      the prefix on the file name (usually "lib" on UNIX-like systems).
      Anyway I don't see a need to change this value. It will be called
      "libuchardet.dll" on Win32. I don't see the problem.
      Also this code was already commented out, and compilation and usage for
      Win32 works just fine without it. :-)
      df67ae4f
  12. 06 Nov, 2017 4 commits
    • Jehan's avatar
      CMake: do not check/set SSE and float-store options on non-x86 targets. · cd617d18
      Jehan authored
      Not sure if that's right. I guess we might also find non-x86 machines
      where floating point computation won't follow IEEE standard as well. But
      let's do this for now to prevent from useless performance hit.
      cd617d18
    • Jehan's avatar
      CMake: slightly improve the configuration option messages. · 939482ab
      Jehan authored
      Also add full stops, similarly to CMake defaut options.
      939482ab
    • Jehan's avatar
      CMake: rename s/ENABLE_SSE2/CHECK_SSE2/. · 77bf71ea
      Jehan authored
      "ENABLE_SSE2" may be misleading since having it ON does not necessarily
      mean that SSE2 flags will be actually set. It only means that the
      support will be checked (then set only when supported).
      Also adding the warning about possible performance decrease.
      77bf71ea
    • Jehan's avatar
      Bug 101033 - Testsuite fails on i386. · 5996bbd9
      Jehan authored
      Floating point accuracy may be different depending on the architecture.
      In particular some architectures may store floating values with
      different precision, resulting in unreliable results across various
      machines. It would seem in particular true on older x86 machines without
      SSE support, which were reported cases.
      The proposed solution is to test for SSE support and explicitly add the
      proper flags (even though they are set by default anyway on modern x86).
      When this is not available (on older machines or simply when not on x86
      processors), I replace sse2 flags with -ffloat-store, which forces IEEE
      floating point definition.
      The reason why not to always force -ffloat-store is because it seems to
      decrease performance on some machines. SSE is prefered if available.
      
      I also add a ENABLE_SSE2 option on the CMake file to allow builders to
      use -ffloat-store even though SSE2 may be available on the build
      machine. This would allow to build portable binaries which can also be
      installed on older machines.
      5996bbd9
  13. 20 Sep, 2017 1 commit
  14. 27 Aug, 2017 1 commit
  15. 19 Aug, 2017 1 commit
  16. 18 Aug, 2017 1 commit
  17. 28 May, 2017 5 commits
    • Jehan's avatar
      README: Gentoo also has a uchardet package. · d9d01474
      Jehan authored
      And it is up-to-date with upstream URL at Freedesktop! Good!
      d9d01474
    • Jehan's avatar
      Bug 101032 - assignments to nsSMState in nsCodingStateMachine result... · 53f7ad0e
      Jehan authored
      ... in unspecified behavior.
      When compiling with UBSan (-fsanitize=undefined), execution complains:
      > runtime error: load of value 5, which is not a valid value for type 'nsSMState'
      Since the machine states depend on every different charset's state
      machine, it is not possible to simply extend the enum with more generic
      values. Instead let's just make the state as an unsigned int value and
      define the 3 generic states as constants.
      53f7ad0e
    • Jehan's avatar
      Request C++11 standard project-wise and make it a strong requirement. · 50bc02c0
      Jehan authored
      It is unneeded to do it by target, using the globale property
      CMAKE_CXX_STANDARD instead. Also with CMAKE_CXX_STANDARD_REQUIRED, I
      make this a strong requirement. The documentation indeed states that the
      CXX_STANDARD "is treated as optional and may “decay” to a previous
      standard if the requested is not available".
      This means that uchardet will likely not be buildable with a compiler
      with no C++11 support. But I assume this is not a common situation, and
      probably we should not care about outdated compilers. I remain open to
      suggestions and disagreement on the topic obviously.
      50bc02c0
    • Jehan's avatar
      Make C++11 the standard used for uchardet. · 1bf198cb
      Jehan authored
      As discussed in bug 101032, it seems like the most common usage
      nowadays. Let's make a specific choice to avoid different behavior on
      different builds later on.
      1bf198cb
    • Jehan's avatar
      Bug 101204 - different results with different chunk sizes. · 98bf4d73
      Jehan authored
      ASCII and ISO-8859-1 should not be detected in
      nsUniversalDetector::HandleData() but in nsUniversalDetector::DataEnd()
      instead. Otherwise it creates an unwanted shortcut from the first call
      to uchardet_handle_data() if the input is broken into several pieces and
      if the first chunk happens to be ASCII (or ASCII + NBSP).
      98bf4d73
  18. 14 May, 2017 3 commits
    • Jehan's avatar
      src: minor indentation fix. · 50743e16
      Jehan authored
      50743e16
    • Jehan's avatar
      test: output the test file path which we failed to open. · 6cf13f10
      Jehan authored
      Also properly free the string in such case.
      6cf13f10
    • Jehan's avatar
      Bug 101030 - Buffer overflow related to ISO2022JP detection in... · 94b10b9b
      Jehan authored
      ... en:ascii and ja:iso-2022-jp tests.
      I don't know much about this part of the code at this point. Yet I can
      clearly deduct that the length of the charLenTable is supposed to be the
      classFactor of the SMModel. Therefore 2 classes were missing in
      ISO2022JPCharLenTable, hence a buffer overflow happens when trying to
      reach these. I am not sure of the values I should add there. For now,
      let's set 0 to both, but adding also a comment so that I can review this
      code later on, when I will get to read and understand this piece of code
      in more depth.
      94b10b9b