uchardet issueshttps://gitlab.freedesktop.org/uchardet/uchardet/-/issues2023-07-17T14:45:22Zhttps://gitlab.freedesktop.org/uchardet/uchardet/-/issues/1Impossible to link to uchardet without -lstdc++ in cross environment2023-07-17T14:45:22ZBugzilla Migration UserImpossible to link to uchardet without -lstdc++ in cross environment## Submitted by Coacher
Assigned to **Jehan Pagès `@Jehan`**
**[Link to original bug (#102119)](https://bugs.freedesktop.org/show_bug.cgi?id=102119)**
## Description
Hello.
I have successfully built uchardet in my cross environme...## Submitted by Coacher
Assigned to **Jehan Pagès `@Jehan`**
**[Link to original bug (#102119)](https://bugs.freedesktop.org/show_bug.cgi?id=102119)**
## Description
Hello.
I have successfully built uchardet in my cross environment (build is amd64, host is arm). Now I'm trying to link to it.
# cat test.c
int main() {
return 0;
}
# arm-unknown-linux-gnueabi-gcc test.c -luchardet
/usr/libexec/gcc/arm-unknown-linux-gnueabi/ld: warning: libstdc++.so.6, needed by /usr/arm-unknown-linux-gnueabi/usr/lib/libuchardet.so, not found (try using -rpath or -rpath-link)
/usr/arm-unknown-linux-gnueabi/usr/lib/libuchardet.so: undefined reference to `__gxx_personality_v0@CXXABI_1.3'
/usr/arm-unknown-linux-gnueabi/usr/lib/libuchardet.so: undefined reference to `vtable for __cxxabiv1::__si_class_type_info@CXXABI_1.3'
/usr/arm-unknown-linux-gnueabi/usr/lib/libuchardet.so: undefined reference to `__cxa_pure_virtual@CXXABI_1.3'
/usr/arm-unknown-linux-gnueabi/usr/lib/libuchardet.so: undefined reference to `vtable for __cxxabiv1::__class_type_info@CXXABI_1.3'
/usr/arm-unknown-linux-gnueabi/usr/lib/libuchardet.so: undefined reference to `operator new(unsigned int)@GLIBCXX_3.4'
/usr/arm-unknown-linux-gnueabi/usr/lib/libuchardet.so: undefined reference to `__cxa_end_cleanup@CXXABI_1.3'
/usr/arm-unknown-linux-gnueabi/usr/lib/libuchardet.so: undefined reference to `operator delete(void*)@GLIBCXX_3.4'
collect2: error: ld returned 1 exit status
# arm-unknown-linux-gnueabi-gcc test.c -luchardet -lstdc++
#
As you can see without adding -lstdc++ I cannot link to uchardet.
This problem affects real applications (e.g. mpv), which rely on linker flags provided by uchardet.pc. But uchardet.pc provides only -luchardet, while -lstdc++ is listed under Libs.private. Is there a sane way to deal with this problem?
Thank you.
P.S. gcc test.c -luchardet on native system works just fine.Jehan PagèsJehan Pagèshttps://gitlab.freedesktop.org/uchardet/uchardet/-/issues/8no newline at end of file2020-04-22T21:04:04Zzengno newline at end of filehello, when i compiling uchardet to dynamic library on linux, some compiler report warning : no newline at end of file.
I find some cpp file in path uchardet/src/LangModels such as LangEsperantoModel.cpp indeed not end file with a new li...hello, when i compiling uchardet to dynamic library on linux, some compiler report warning : no newline at end of file.
I find some cpp file in path uchardet/src/LangModels such as LangEsperantoModel.cpp indeed not end file with a new line. so i fix this warning by add a new line at end of those file.
This is why compiler report warning:
in C99 standard:
A backslash immediately before a newline has long been used to continue string literals, as well as preprocessing command lines. In the interest of easing machine generation of C, and of transporting code to machines with restrictive physical line lengths, the C89 Committee generalized this mechanism to permit any token to be continued by interposing a backslash/newline sequence.
Therefore, is that a meaningful way to add a new line at end of those file to avoid warning from compiler?https://gitlab.freedesktop.org/uchardet/uchardet/-/issues/9Wrong detected encoding utf-8 instead of cp8552022-12-18T23:14:13ZRustam SayfutdinovWrong detected encoding utf-8 instead of cp855Hello!
I found an [example](/uploads/ee0f778b278fd777457f25bb99078d3f/sample.txt) where _uchardet_ wrong detected encoding utf-8 instead of cp855.
I debuged my usage (in this [for-loop](https://gitlab.freedesktop.org/uchardet/uchardet/...Hello!
I found an [example](/uploads/ee0f778b278fd777457f25bb99078d3f/sample.txt) where _uchardet_ wrong detected encoding utf-8 instead of cp855.
I debuged my usage (in this [for-loop](https://gitlab.freedesktop.org/uchardet/uchardet/blob/master/src/nsUniversalDetector.cpp#L317)):
- multi-byte prober: utf-8 with _confidence = 0.752499998_
- single-byte prober: cp855 with _confidence = 0.685687244_
Using a different the implementation by [UTF Unknown](https://github.com/CharsetDetector/UTF-unknown/tree/v0.1), I got the expected result:
- multi-byte prober: only check and get _GB18030Prober_ object with _confidence = 0.01_
- single-byte prober: cp855 with _confidence = 0.8776797_0.1.0https://gitlab.freedesktop.org/uchardet/uchardet/-/issues/10Crashing sequence with nsSJISProber2020-04-22T20:18:03ZJP CimalandoCrashing sequence with nsSJISProberHi. By using charset detection on a set of MIDI file metadata, I have discovered and isolated a crashing sequence.
It happens when uchardet is fed the input as multiple strings, and a string of the set is of length 0.
File attach produc...Hi. By using charset detection on a set of MIDI file metadata, I have discovered and isolated a crashing sequence.
It happens when uchardet is fed the input as multiple strings, and a string of the set is of length 0.
File attach produces the crash. Revision bdfd6116a965fd210ef563613763e724424728b7
[test-case1.cc](/uploads/fb12034b7990551d7488437fffd79dfa/test-case1.cc)
The above file also contains a backtrace.
As observed, a buffer access is attempted at `aLen-1` with value `aLen=0`.https://gitlab.freedesktop.org/uchardet/uchardet/-/issues/11Any plans to make new release?2020-04-23T15:18:15ZTomasz KłoczkoAny plans to make new release?I think that it would be good to make new release with fresh code base out of already accumulated patches in git :)I think that it would be good to make new release with fresh code base out of already accumulated patches in git :)https://gitlab.freedesktop.org/uchardet/uchardet/-/issues/12Add option to play safe detection2020-04-22T19:52:07ZHansAdd option to play safe detectionI see many reports of wrong encoding detection.
I was about to report another one, and I don't think this will ever be solved.
But if you could add a line option --safe-uf8 (or what you prefer)
That if the option is set and the file can ...I see many reports of wrong encoding detection.
I was about to report another one, and I don't think this will ever be solved.
But if you could add a line option --safe-uf8 (or what you prefer)
That if the option is set and the file can be mapped to utf8 it returns utf8 and if does not map, then return other encoding detected.
For example the file attached has a simple "á" char and the file is detected as : TIS-620
When it could be better assigned to ISO-98859-1 or UTF8
[test.txt](/uploads/0b67668028dde12b5bf9ba3ca68011c4/test.txt)https://gitlab.freedesktop.org/uchardet/uchardet/-/issues/14Make a portable executable2020-04-22T19:24:38ZFlorianPerezMake a portable executableHi,
I like your project and I would like to use your tool on a workstation without manually install lib.
(I'm not root on the workstation, so I can't install lib in the /usr/lib/ folder)
Can you explain me how to create a portable exec...Hi,
I like your project and I would like to use your tool on a workstation without manually install lib.
(I'm not root on the workstation, so I can't install lib in the /usr/lib/ folder)
Can you explain me how to create a portable executable of your project ?
Thanks in advance!https://gitlab.freedesktop.org/uchardet/uchardet/-/issues/15Error with file called "-h" (request to support "--" option)2020-07-28T11:45:37ZJamie Landeg-JonesError with file called "-h" (request to support "--" option)Before calling uchardet in a script, I need to first check if the filename is literally called '-h' or '-v', and if so prefix them with a "./" before calling uchardet. (Well, I actually check for filanemes beginning with '-', but you get...Before calling uchardet in a script, I need to first check if the filename is literally called '-h' or '-v', and if so prefix them with a "./" before calling uchardet. (Well, I actually check for filanemes beginning with '-', but you get the point)
Rather than this kludge, could you add the traditonal "--" as an "end of options" marker?
And yes, this problem did crop up in "real life"!
Cheers, Jamiehttps://gitlab.freedesktop.org/uchardet/uchardet/-/issues/16Tests failing on x86 Alpine Linux2020-04-28T18:49:09ZRasmus Thomsenoss@cogitri.devTests failing on x86 Alpine LinuxHello,
with uchardet 0.0.7 (and for that matter 0.0.6) some tests are failing on x86, namely:
```
The following tests FAILED:
29 - fi:iso-8859-1 (Failed)
37 - ga:iso-8859-1 (Failed)
106 - th:tis-620 (Failed)
```
Since the tests o...Hello,
with uchardet 0.0.7 (and for that matter 0.0.6) some tests are failing on x86, namely:
```
The following tests FAILED:
29 - fi:iso-8859-1 (Failed)
37 - ga:iso-8859-1 (Failed)
106 - th:tis-620 (Failed)
```
Since the tests only return 1 and don't print a backtrace or any extra info I'm not really sure how to supply extra info.
OS: Alpine Linux Edge (so musl libc).
Arch: x86.https://gitlab.freedesktop.org/uchardet/uchardet/-/issues/17Broken links in README2020-04-29T14:21:35ZArtem KlevtsovBroken links in READMEList:
- http://lxr.mozilla.org/seamonkey/source/extensions/universalchardet/
- http://www.mozilla.org/projects/intl/UniversalCharsetDetection.html
Also there are my binding to R language on CRAN: https://CRAN.R-project.org/package=uchar...List:
- http://lxr.mozilla.org/seamonkey/source/extensions/universalchardet/
- http://www.mozilla.org/projects/intl/UniversalCharsetDetection.html
Also there are my binding to R language on CRAN: https://CRAN.R-project.org/package=uchardet
Also QtAV use uchardet.https://gitlab.freedesktop.org/uchardet/uchardet/-/issues/20Can libuchardet-ios.a support iOS Simulator?2021-11-09T13:11:45ZYeaLink89Can libuchardet-ios.a support iOS Simulator?libuchardet-ios.a能不能支持下iOS Simulator,现在在模拟器下闪退。libuchardet-ios.a能不能支持下iOS Simulator,现在在模拟器下闪退。https://gitlab.freedesktop.org/uchardet/uchardet/-/issues/21Please add Greek CP737 support2022-12-18T23:03:18ZunxedPlease add Greek CP737 supportSample attached.
It's content is phrase "Νέο έγγραφο κειμένου" in Greek, encoded to CP737.
uchardet detects this as CP1252
[cp737.txt](/uploads/3480967bba2c9d0a331769a57bde035d/cp737.txt)
Thanks!Sample attached.
It's content is phrase "Νέο έγγραφο κειμένου" in Greek, encoded to CP737.
uchardet detects this as CP1252
[cp737.txt](/uploads/3480967bba2c9d0a331769a57bde035d/cp737.txt)
Thanks!0.1.0https://gitlab.freedesktop.org/uchardet/uchardet/-/issues/22Please add Hebrew CP862 support2022-12-16T22:37:27ZunxedPlease add Hebrew CP862 supportThis sample file contains string "מערכת להסעת המונים במטרופולין תל אביב" in Hebrew CP862 charset. It is detected by uchardet as "unknown".[cp862.txt](/uploads/7c85b7381c07156dd4298c7fc8d7016a/cp862.txt)This sample file contains string "מערכת להסעת המונים במטרופולין תל אביב" in Hebrew CP862 charset. It is detected by uchardet as "unknown".[cp862.txt](/uploads/7c85b7381c07156dd4298c7fc8d7016a/cp862.txt)0.1.0https://gitlab.freedesktop.org/uchardet/uchardet/-/issues/24Support cmake exported targets2021-11-09T09:52:16ZPedro López-CabanillasSupport cmake exported targetsIf cmake exported targets are implemented in uchardet, a downstream project using CMake can find and link the libuchardet library directly with cmake (without needing pkg-config at all) this way:
~~~
project(sample LANGUAGES C)
find_pack...If cmake exported targets are implemented in uchardet, a downstream project using CMake can find and link the libuchardet library directly with cmake (without needing pkg-config at all) this way:
~~~
project(sample LANGUAGES C)
find_package ( uchardet )
if (uchardet_FOUND)
add_executable( sample sample.c )
target_link_libraries ( sample PRIVATE uchardet::libuchardet )
endif ()
~~~
The build system should create one exported target for each built target feature, for instance:
- The executable **uchardet::uchardet**
- The shared library **uchardet::libuchardet**
- The static library **uchardet::libuchardet_static**
After installing the project in a prefix like "$HOME/uchardet/", the downstream project can be configured with a command like:
~~~
cmake -DCMAKE_PREFIX_PATH="$HOME/uchardet/;..."
~~~
Instead of installing, the build directory can be used directly, for instance:
~~~
cmake -Duchardet_DIR="$HOME/build-uchardet-0.1.0/" ...
~~~~https://gitlab.freedesktop.org/uchardet/uchardet/-/issues/25Different results when using on Ubuntu 16.04 and Ubuntu 20.042022-12-17T22:30:38ZSjors OttjesDifferent results when using on Ubuntu 16.04 and Ubuntu 20.04I have an application using uchardet running on an Ubuntu 16.04 server. I'm trying to update the server to Ubuntu 20.04, but I'm running into the problem that the results from uchardet are sometimes different. In some cases, the result f...I have an application using uchardet running on an Ubuntu 16.04 server. I'm trying to update the server to Ubuntu 20.04, but I'm running into the problem that the results from uchardet are sometimes different. In some cases, the result from 16.04 is correct, and the result from 20.04 is incorrect. I'm running uchardet version 0.0.6 on both machines.
Results on 16.04 and 18.04 seem to be the same. Results on 20.04 are different.
Is there anything I can do to make uchardet return the same results on Ubuntu 20.04 as it does on Ubuntu 16.04?https://gitlab.freedesktop.org/uchardet/uchardet/-/issues/27Misuse of CMAKE_BINARY_DIR in CMake2021-12-01T16:49:45ZAndreas SteflMisuse of CMAKE_BINARY_DIR in CMakeI believe that `CMAKE_BINARY_DIR` should be `CMAKE_CURRENT_BINARY_DIR` here https://gitlab.freedesktop.org/uchardet/uchardet/-/blob/master/CMakeLists.txt#L65
I created a PR on GitHub a while ago. https://github.com/freedesktop/uchardet/...I believe that `CMAKE_BINARY_DIR` should be `CMAKE_CURRENT_BINARY_DIR` here https://gitlab.freedesktop.org/uchardet/uchardet/-/blob/master/CMakeLists.txt#L65
I created a PR on GitHub a while ago. https://github.com/freedesktop/uchardet/pull/1
Was not able to create a PR here that's why I create an issue.