Text Blame History Raw

Developing programs utilizing libenca

  • Look at libenca API documentation in devel-docs/html.
  • Look into enca source how it uses libenca. Note enca is quite a simple application (practically all libenca interaction is in src/enca.c). It's single-threaded and uses one language and one analyser all the time. Provided each thread has its own analyser, libenca should be thread-safe (untested).
  • Take names starting with ENCA, Enca, enca, _ENCA, _Enca, and _enca as reserved.
  • pkgconfig is supported, you can use PKG_CHECK_MODULES to check for libenca in your configure scripts

How to add a new charset/encoding

(optional steps are marked [optional]):

  • iconvcap.c:
    • Add a new test (even if you are 100% sure iconv will never support it), please see top of iconvcap.c for some documentation how it works.
  • tools/encodings.dat:
    • Add a new entry.
    • Use @ICONV_NAME_<name>@ (as it will appear in iconvcap output) for iconv names.
  • tools/iconvenc.null:
    • Add it (with NULL)

Specifically, for regular 8bit (language dependent) charsets:

  • lib/unicodemap.c:
    • Add a new map to Unicode (UCS-2) unicode_map_...[].
    • Add a new UNICODE_MAP[] entry.
  • lib/filters.c: [optional]
    • Create a new filter or make an alias of an existing filter.
  • lib/lang_??.c:
    • Add the new encoding to some existing language(s).
    • Add appropriate filters or hooks [optional].
  • data/maps/??.map:
    • Add a new map to Unicode (UCS-2)

Specifically, for multibyte encodings:

  • lib/multibyte.c:
    • Create a new check function.
    • Put it into appropriate ascii/8bit/binary test group ENCA_MULTIBYTE_TESTS_ASCII[], ENCA_MULTIBYTE_TESTS_8BIT[], ENCA_MULTIBYTE_TESTS_BINARY[].
    • Put strict tests (i.e. test which may fail) first, looks-like tests last.

How to add a new surface

  • Try to ask the author what to do, since this may be complicated, or
  • Hack, basically it must be added to lib/enca.h EncaSurface enum, to lib/encnames.c SURFACE_INFO[] a detection method must be added to lib/guess.c and now the most complicated part: this new method must be used in the right places in lib/guess.c make_guess().

How to add a new language

  • Create a new language file:
    • Create new lib/lang_....c files by copying some existing (use locale code for names)
    • Fill all encoding and occurence data, create filters and hooks (see filters.c too). You can do it manually, but look how it's done for existing languages in data/* and read data/README.
  • lib/internal.h:
    • Add new ENCA_LANGUAGE_....
  • src/lang.c:
    • Add a new LANGUAGE_LIST[] entry pointing to the ENCA_LANGUAGE_....

Automake, autoconf, libtool, ... note

If you run ./autogen.sh and it finishes OK, you are lucky and can expect things to work.

You have to give --enable-maintainer-mode to ./configure (or ./autogen) to build dists and/or the strange stuff in tools/, data/, tests/, and devel-docs/.

Repository and continuous integration

The git repository is located at GitHub:

http://github.com/nijel/enca

There is also continuous integration on Travis:

https://travis-ci.org/nijel/enca