Tree - source-git/enca - CentOS Git server

source-git / enca

Files

Blob Blame History Raw
#============================================================================
# Enca v1.19 (2016-09-05)  guess and convert encoding of text files
# Copyright (C) 2000-2003 David Necas (Yeti) <yeti@physics.muni.cz>
# Copyright (C) 2009-2016 Michal Cihar <michal@cihar.com>
#============================================================================

List of user-visible changes in Enca
More detailed log can be obtained from older changelogs or git log.

Legend: + new feature
        * change of behaviour (including disappearing of a feature)
        - bugfix

enca-1.19-dev
  - fix possible memory leak
  - make utf-8 detection work even on one character

enca-1.18 2016-01-07
  - fix installation of devhelp documentation

enca-1.17 2016-01-04
  - Fixed conversion of GB2312 encoding with iconv
  - Fixed iconv conversion on OSX
  - Documentation improvements
  - Fixed execution of external converters with ACLs
  - Improved test coverage to 80%

enca-1.16 2014-10-20
  - Fixed typo in Belarusian language name
  - Added aliases for Chinese and Yugoslavian languages

enca-1.15 2013-09-30
  - Documentation improvement
  - Development moved to GitHub
  - Do not use deprecated autoconf macros

enca-1.14 2012-09-11
  - Allow standard names for belarusian and slovenian languages, thanks 
    to Branislav Geržo for suggestion.
  - Reset strictness when check buffer less than file size, thanks to 
    Sam Liao.
  - Fixed typos in man page, thanks to A. Costa.

enca-1.13 2010-02-09
  - Reverse usage of temp file while converting using recode to prevent
    file truncation (bug #1135).

enca-1.12 2009-10-29
  - Fixes some minor memory leaks.
  - Fixes little problems in autoconf scripts.

enca-1.11 2009-09-25
  - Dropped scanf configure test which is not used at all.
  - Fixes some wrong format strings.

enca-1.10 2009-08-25
  + Enca is back alive or at least in maintenance mode.
  * Enca now lives in git repository, see <http://gitorious.org/enca>.
  - Add missing charset koi8u to belarusian language.
  - Fixed some typos in program and documentation.

enca-1.9 2005-12-18
  + support for HZ encoding
  * Big5 and GBK detection improved
  - enca.spec no longer installs docs to world-unreadable directory

enca-1.8 2005-11-24
  + Chinese (Big5 and GBK) support (thanks to Zuxy)
  * deb/ subdirectory is gone as there is finally an Enca package in Debian
    (thanks to Michal Cihar)
  - manual page clean-up (thanks to Michal Cihar)

enca-1.7 2005-02-27
  + new name type: preferred MIME name (option -m)
  - broken iconv detection on some system was fixed

enca-1.6 2004-09-01
  * English language names (--list=languages, enca_language_english_name())
    were changed to lowercase to match common locale aliases
  - Win32, i.e. MinGW and Cygwin, build problems were fixed

enca-1.5 2004-05-30
  - crash on impossible recovery after iconv failure in pipe was fixed
  - rpm building problems on Mandrake Linux were fixed

enca-1.4 2004-05-12
  - dependency of guessing API on locales (via ctype functions) was fixed
  - --help text generation failure on some systems was fixed

enca-1.3 2003-12-24
  + [libenca] it's possible to get analyser option values, not just set them
  * a good BOM (byte order mark) increases the chance of being recognized for
    UCS-4 and UTF-8 too
  * external converter wrappers were moved from bin to libexec and the b-
    prefix was removed (though it still works)
  * external converters are no longer searched in PATH, nonstandard ones
    has to be specified with full path

enca-1.2 2003-11-26
  - fixed segfault in language detection for some locale setups

enca-1.1 2003-11-17
  - fixed losing data at the end of file when using external converters in a
    pipe (and maybe in other situations)
  - [libenca] enca_analyser_free() not freeing analyser completely was fixed

enca-1.0 2003-11-06
  * deprectated options -T, -R, -S, -u, -U, -m, and -M were finally removed
  * default HTML API docs installation path changed to the new gtk-doc style
    (DATADIR/gtk-doc/html/enca)
  * debian/ subdir moved to deb/ to allow official deb creation w/o too much
    hassle

enca-0.99.4 2003-07-15
  - several race conditions in librecode and iconv interfaces were fixed
  - temporary file names are much less predictable now

enca-0.99.3 2003-06-30
  * Debian package is back from death
  * failure to find external converter is now fatal
  - fixed build problems on FreeBSD (and probably other Unices)
  - libiconv is not used for `conversion to ASCII' since never does the
    Right Thing, whatever it is
  - when conversion with libiconv fails, the file should now survive intact
  - fixed build problems on systems w/o libiconv (hopefully)
  - fixed distclean and uninstall targets to really clean and uninstall
    everything
  - fixed builds with separate source (read-only) and build directories
  - fixed builds with --without-libiconv and --without-librecode on GNU/Linux
  - external converter is not checked when it's not going to be used

enca-0.99.2 2003-06-25
  + EOL type is used to decide ambiguous cases, e.g. CP1250 is reported
    instead of ISO-8859-2/CRLF
  * --list languages by default prints English names, instead of ISO-639a
    codes, use -e or -r to get the old listing
  * if LC_CTYPE is something like en_US, more locale categories are examined
    to detect the language
  * cork charset was modified to contain \n, \r and \t in the same places as
    ASCII
  * some heuristics tuning

enca-0.99.1 2003-06-22
  + libenca pkg-config support
  * all libenca tuning parameters (-T, -R, -S, -u, -U, -m, and -M) were
    marked deprecated and are noop, Enca should DWIM
  * ambiguity is now always OK when the sample has the same meaning in all the
    charsets
  * deprecated `built-in-encodings' and `encodings' lists were removed
  * PAGER feature was removed
  - exchanged `latvian' and `lithuanian' language names were fixed (`lv' and
    `lt' were always OK)
  - missing tests for the new languages was added to the test suit

enca-0.99.0 2003-06-14
  + added some support for: Bulgarian, Croatian, Estonian, Hungarian, Latvian,
    Lithuanian, Slovene
  + a new algorithm for 8bit-dense languages (cyrillics), the old one is used
    as a fallback
  * removed support for non-transitive iconv (such a thing should not exist)
  * auxiliary tools in data are not longer built in regular builds,
    use --enable-maintainer-mode to rebuild them, create dists, etc.
  - fixed iconv interface surface check pickier than iconv itself inhibiting
    some otherwise possible conversions
  - fixed u+x permissions on temporary files (from 0.10.7)
  - fixed not deleting temporary files in iconv interface
  - fixed broken iconv interface behaviour in pipes
  - fixed iconvcap misdetecting Latin5 as ISO-8859-5
  - fixed casual `make distclean' failures

enca-0.10.7 2003-01-28
  - fixed interchanged iconv and cstocs encoding names
  - corrected(?) librecode surface interaction
  - fixed a temporary file creation race condition
  * added tex and utf8 to cstocs (names and b-cstocs)

enca-0.10.6 2002-10-22
  + enconv uses DEFAULT_CHARSET variable, exactly as recode
  - ENCAOPT works everywhere, albeit imperfectly
  - options -P and -p no longer imply -M too
  - ambiguous mode (-M) works again
  - pager is run so that help text doesn't disappear
  - standard input it printed as STDIN with -d, not as null
  - make check works again
  - it compiles wihtout recode again

enca-0.10.5 2002-10-13
  + UTF-8 recognition in binary and otherwise messy files
  + detection of double-encoding from some 8bit charset to UTF-8
  + Cork encoding conversion
  * librecode interaction was (hopefully) improved
  - fixed some build-time problems

enca-0.10.4 2002-10-10
  + added Cork encoding support for Czech, Slovak and Polish
  - empty files are now considered convertible to any encoding
  - removed the so-called faster (in fact slower) I/O
  - fixed some more compile-time search path issues

enca-0.10.3 2002-09-22
  * added support for perl umap as external converter
  - fixed external converter wrappers to work with standard sh
  - fixed some compile-time library search path issues

enca-0.10.2 2002-09-15
  + target charset is automatically obtained from locales when called as
    enconv, new options --guess, --auto-convert
  + English language names can be used instead of ISO-639 codes everywhere
  - cs_SK and ru_UA locales are properly recognised as Slovak and Ukrainian

enca-0.10.1 2002-08-29
  + faster I/O
  * external converters can be disabled at build time
  - `-' is accepted for standard input
  - fixed broken built-in converter
  - fixed crasing on an unknown language
  - trivial (identity) conversions are not performed any more
  - help is now printed when input is a terminal and no argument specified
  - changed braindamaged <STDIN>, <STDOUT> to STDIN, STDOUT in messages
  - various small fixes and build-time improvements

enca-0.10.0 2002-08-26
  + added support for Ukraininan (CP1251, IBM855, ISO-8859-5, KOI8-U, maccyr
    CP1125), Belarusian (CP1251, IBM866, ISO-8859-5, KOI8-UNI, maccyr,
    IBM855) and Polish (ISO-8859-2, ISO-8859-12, ISO-8859-16, Baltic, macce,
    IBM852, CP1250)
  + Enca library introduced
  * dropped native Debian package
  * --details no longer prints guessing details (now is mostly like --human)
  * --list=encodings, --list=built-in-encodings corrected to --list=charsets,
    --list-built-in charsets (old names supported with a warning)
  * improved Czech and Slovak charsets detection

enca-0.9.4: 2002-03-03
  - built-in converter didn't convert more than first 64kB of a file

enca-0.9.3: 2001-07-16
  + a native Debian package
  - fixed random reporting of nonsense results
  - fixed self-contradictory --details output when file was quoted-printable
    encoded
  - fixed poor performance on non-GNU/Linux
  - made pager less intrusive (instead of intrusive `less' ;-)
  - --list=encodings prints only `known' encodings
  - fixed several compile-time/portability problems

enca-0.9.2: 2001-07-13
  * --help and --license are displayed through pager (when possible)
  - fixed broken language hooks--they were never activated (from 0.9.1)
  - fixed reporting ASCII when a 7bit encoding was detected
  - fixed boundary-case behaviour when recovering from librecode failures

enca-0.9.1: 2001-06-25
  + support for Macintosh Cyrillic, including conversion
  + support for unusual UCS-4 byte orders (3412 and 2143)
  + new option --license printing full enca license
  * exit codes now make sense (0, 1, 2; where 2 means serious troubles)
  - temporary files are no longer world-readable

enca-0.9.0: 2001-03-26
  Serious incompatibilities:
  * -E and -C option letters exchanged (much better mnemonics)
  * converter wrappers renamed to b-cstocs and b-recode
  * finding only 7bit ASCII is no longer considered failure
  * need to use --language to set language (sometimes)
  * dull converter behaviour no longer supported, -x syntax changed
  * option -g removed (try --name=aliases)
  * option -c changed to --list=converters, listing format changed
  * option -l changed to --list=encodings, listing format changed
  * converter names are no longer case insensitive
  * no longer uses cstocs names as canonical
  * external converters are called with Enca's names, not cstocs's

  Other changes:
  + support for slovak and russian (and `none') language
  + support for CP1251, IBM866, ISO-8859-5 and KOI8-R, including conversion
  + UCS-2, UCS-4, UTF-8, UTF-7 and LaTeX encoding recognition
  + much more encoding aliases accepted
  + long `GNU style' command line options
  + new output types: --enca-name, --iconv-name
  + output type --name=WORD allowing to select output type by name
  + ENCAOPT environment variable
  + language detection from locales
  + support for surfaces (experimental)
  + new option --list printing various listings
  + new converter wrapper b-map (for perl `map')
  + new option -m to reset -M back
  + new language filters
  + new options -u and -U to control multibyte encoding checks
  + included [generated] enca.spec into the tarball to allow `rpm -tb'
  * -d output improved
  * read limit changed to 16MB
  * librecode now run with flags diacritics_only and ascii_graphics
  - fixed broken -P options
  - fixed several build problems on non-GNU/Linux systems
  - fixed some missing and wrong characters in Unicode data
  - temporary copy of damaged original file is not deleted when rescue fails

enca-0.8.x: Since features planned for 0.8 and 0.9 happened to be developed
  simultaneously, this version number has been skipped.

enca-0.7.7: 2001-01-01
  + ability to use UNIX98 iconv conversion functions
  + the word `none' can be used as -E parameter causing clearing of converter
    list
  - fixed disarranged help text, misspelled word `European' in macce long
    name, obsolete statements in manual page and other stuff of this kind

enca-0.7.6: 2000-11-20
  + any converter combination/order can be now specified with -E, old -E
    meaning is no longer valid
  + new option -c (list all valid converter names)
  * cork encoding not supported anymore
  * better verbosity
  * `/' is added to recode recoding requests thus partially solving the
    surface problem---surface never changes
  * some errors like specifying invalid value of threshold are no longer fatal,
    the bad values are ignored instead
  * handling of some exotic characters in bulit-in converter slightly changed
  - fixed several fatal bugs regarding stdin to stdout conversion
  - stdin is copied to stdout in case of failure whenever possible/applicable

enca-0.7.5: 2000-10-25
  * license changed to GNU GPL Version 2 (i.e. license version is explicitly
    specified)
  * prints error message when conversion is impossible
  * binary data filter improved/changed
  - fails back to external converter when GNU recode library cannot convert
    due to errorneous request
  - '' no longer causes enca to read from stdin
  - tries to restore files damaged by GNU recode library

enca-0.7.4: 2000-10-12
  + box-drawing characters are (carefully) filtered out when guessing
  - fixed intermixed behaviour in SMS/nonSMS modes

enca-0.7.3: 2000-10-09
  + blocks of probably binary data are filtered out when guessing
  * standard input is copied to standard output when its encoding is unknown
  - fixed reading only 4096 bytes from pipe (from 0.7.1)

enca-0.7.2: has been never released
  + GNU recode recoding chains made possible by starting -x (convert) parameter
    with `..'
  + second best guess is marked with `-' in -d (print details) output

enca-0.7.1: 2000-10-02
  * in case of nonfatal i/o failure enca continues processing remaining files

enca-0.7.0: 2000-09-26
  + standard input to standard output conversion
  + short message mode -M
  + ability to use GNU recode library
  + new output type -r (encoding name after RFC1345)
  + ability to convert cork internally
  + new external converter brecode (recode wrapper)
  + new output type -g (list of aliases)
  + new option -V (verbose)
  * -x (convert) paramteres syntax changed to in_enc..out_enc (old syntax still
    supported, will be removed in 0.8.x)
  * option -e (disable external) no longer supported, empty string as -C
    (external converter) parameter can be used instead
  * encoding names specified as -x (convert) parameters are case insensitive
  * ascii is not considered unknown encoding (i.e. failure) so enca returns 0
  * -d (print details) output improved/changed/updated
  * -p (prefix result with file name) no longer prints conversion details
  * by default result is prefixed by file name when enca is run on more than
    one file

enca-0.6.2: 2000-08-17
  + help texts (-h and -v) made usable (thanx to Halef)

enca-0.6.1: 2000-08-15
  - tarball bugfix

enca-0.6.0: 2000-07-20
  + bulilt-in converter
  + -x (convert) can now take form -x in_enc,out_enc causing enca to behave
    like a dull converter
  + new options -e and -E (disable internal/external converter)
  + new option -l (print internally-convertible encodings)

enca-0.5.0: 2000-07-17
  * -p (prefix result with file name) causes enca to print what is converted
    and how
  * iso8859-2/cp1250 recognition improved
  - doesn't spawn external converters as fast as is possbile, but waits for
    them to return
  - fixed `Unrecognized encoding' when winner is 1250 (from 0.4.3)
  - corrected -d (print details) table alignment

enca-0.4.3: 2000-07-14
  * -d (print details) prints encodings alphabetically sorted
  - corrected short encoding name t1 -> cork
  - division-by-zero bugfixes

enca-0.4.2: has been never released
  * options -m/-M ([don't] use iso8892-2/cp1250 hack) no longer supported
  - fixed showing standard input as empty string (<STDIN> is printed now)

enca-0.4.1: 2000-07-12
  * default of 60 significant characters changed to 10

enca-0.4.0: 2000-07-10
  + first public release
source-git / enca

Source Code

Files