[Feature request] - More info returned by library
Submitted by bk1..@..il.com
Assigned to Jehan Pagès @Jehan
Link to original bug (#104402)
Description
Hi! I want to suggest what would be nice what the library return more info about the file analized. Maybe to have the confidence rate to decide if detection is good enough. I see what in your responses you quote the confidence rate but I dont see available in the returned functions of the library.
Maybe a return object type record (in Pascal... I dont know what is the name in C) like Charset Detector (http://chsdet.sourceforge.net/api.php)
rCharsetInfo = record Name: pChar; // charset name CodePage: integer; // MS Windows CodePage id Language: pChar; // end;
...maybe a new field in a structure like that
Confidence: float
Another good addition would be is the file as BOM or not and what kind of BOM
eBOMKind =( BOM_Not_Found, BOM_UCS4_BE, // 00 00 FE FF UCS-4, big-endian machine (1234 order) BOM_UCS4_LE, // FF FE 00 00 UCS-4, little-endian machine (4321 order) BOM_UCS4_2143, // 00 00 FF FE UCS-4, unusual octet order (2143) BOM_UCS4_3412, // FE FF 00 00 UCS-4, unusual octet order (3412) BOM_UTF16_BE, // FE FF ## ## UTF-16, big-endian BOM_UTF16_LE, // FF FE ## ## UTF-16, little-endian BOM_UTF8 // EF BB BF UTF-8 );
And becoming greedy would be nice to have the kind of Newline the file has
Unix/Mac // LF $0D Windows // LF+CR $0D $0A Old Mac // CR $0A
Sorry for ask to much!!! Thanks in advance