|
Packit |
57a33d |
#============================================================================
|
|
Packit |
57a33d |
# Enca v1.19 (2016-09-05) guess and convert encoding of text files
|
|
Packit |
57a33d |
# Copyright (C) 2000-2003 David Necas (Yeti) <yeti@physics.muni.cz>
|
|
Packit |
57a33d |
# Copyright (C) 2009-2016 Michal Cihar <michal@cihar.com>
|
|
Packit |
57a33d |
#============================================================================
|
|
Packit |
57a33d |
|
|
Packit |
57a33d |
TO THE NEXT RELEASE:
|
|
Packit |
57a33d |
(this list must be empty at the time of release)
|
|
Packit |
57a33d |
|
|
Packit |
57a33d |
IN FUTURE:
|
|
Packit |
57a33d |
(should be done, but maybe not right now)
|
|
Packit |
57a33d |
|
|
Packit |
57a33d |
* LCUC check for cyrillic charsets.
|
|
Packit |
57a33d |
* Backups -- like cp, mv, etc. This will be hard to get right with all the
|
|
Packit |
57a33d |
silly converters.
|
|
Packit |
57a33d |
* More tests
|
|
Packit |
57a33d |
* Structured documentation (the manual page is ugly)
|
|
Packit |
57a33d |
- keep a reasonably brief manual page
|
|
Packit |
57a33d |
- put all the boring doc stuff somewhere else, there are possibilities:
|
|
Packit |
57a33d |
info: searchable, has links, partly portable, has console viewers
|
|
Packit |
57a33d |
HTML: poorly searchable, has links, most portable, has console viewers
|
|
Packit |
57a33d |
TeX (ps): not searchable, no links, portable, most pleasant to read,
|
|
Packit |
57a33d |
no console viewers
|
|
Packit |
57a33d |
=> use SGML (or info itself?) and generate the others
|
|
Packit |
57a33d |
|
|
Packit |
57a33d |
|
|
Packit |
57a33d |
MAYBE SOMEDAY:
|
|
Packit |
57a33d |
(when I will have mood for it, items are freely moved here and removed again)
|
|
Packit |
57a33d |
|
|
Packit |
57a33d |
* Detect all-caps texts OK.
|
|
Packit |
57a33d |
After several experiments it seems we have to
|
|
Packit |
57a33d |
- use pair occurences, at least, with specificaly computed
|
|
Packit |
57a33d |
difference-maximising weights
|
|
Packit |
57a33d |
- guess in two steps
|
|
Packit |
57a33d |
- first with uncapitalization and pair weights, and check whether the
|
|
Packit |
57a33d |
sample looks like natural text (garbageness test, but better)
|
|
Packit |
57a33d |
- if the first approach fails, do it as we do it now
|
|
Packit |
57a33d |
* design better levels of verbosity/warnings (or: remove the --verbose option,
|
|
Packit |
57a33d |
keep important messages and remove all others?)
|
|
Packit |
57a33d |
0: only messages followed by exit(EXIT_FAILURE) (or abort()) are printed
|
|
Packit |
57a33d |
plus `cannot convert...'
|
|
Packit |
57a33d |
1: all nonfatal errors/warnings
|
|
Packit |
57a33d |
2: what converters are tried, what language gets detected (do not duplicate
|
|
Packit |
57a33d |
--details)
|
|
Packit |
57a33d |
>2: debug
|
|
Packit |
57a33d |
* _real_ paranoiac behaviour assuring that nothing gets lost and that
|
|
Packit |
57a33d |
conversion output is either correctly converted text or untouched original
|
|
Packit |
57a33d |
(requires major redesign of all the conversion stuff)
|
|
Packit |
57a33d |
|
|
Packit |
57a33d |
|
|
Packit |
57a33d |
NEVER:
|
|
Packit |
57a33d |
(you can do anything GNU GPL v2 allows, but I'll restrain)
|
|
Packit |
57a33d |
|
|
Packit |
57a33d |
* features that nobody needs (mm, well, ... ok, let it be)
|
|
Packit |
57a33d |
* duplicate other tools functionality more than necessary, use them instead
|
|
Packit |
57a33d |
* dependency on anything that is not ISO C and/or POSIX (moreover do not use
|
|
Packit |
57a33d |
braindead features of both); important functionallity must be present
|
|
Packit |
57a33d |
everywhere nevertheless, enca can be smaller, faster or cleverer on some
|
|
Packit |
57a33d |
(GNU) systems
|
|
Packit |
57a33d |
* localization; please correct my english instead ;->
|
|
Packit |
57a33d |
* converter calling generalization (would require inlcuding the whole wordexp
|
|
Packit |
57a33d |
thing in enca, and: launching external converter is Bad Thing(TM) anyway)
|
|
Packit |
57a33d |
* data in run-time files (needs parser (could live with) and disallows hooks
|
|
Packit |
57a33d |
(can't live without))
|
|
Packit |
57a33d |
* loadable module support (it's not very portable)
|
|
Packit |
57a33d |
-------------
|
|
Packit |
57a33d |
|
|
Packit |
57a33d |
|
|
Packit |
57a33d |
KNOWN ISO C CONFLICTS:
|
|
Packit |
57a33d |
(perhaps to be solved someday)
|
|
Packit |
57a33d |
|
|
Packit |
57a33d |
All constants and typedefs. They start with ENCA_ and Enca, but:
|
|
Packit |
57a33d |
|
|
Packit |
57a33d |
Names beginning with a capital `E' followed a digit or uppercase
|
|
Packit |
57a33d |
letter may be used for additional error code names. [errno.h]
|
|
Packit |
57a33d |
|
|
Packit |
57a33d |
And additionally inside libenca (i.e. not so serious):
|
|
Packit |
57a33d |
* libenca.h: #define EPSILON [errno.h]
|
|
Packit |
57a33d |
* filters.c: isvbox[] [ctype.h]
|
|
Packit |
57a33d |
* guess.c: #define isbinary [ctype.h]
|
|
Packit |
57a33d |
* guess.c: #define istext [ctype.h]
|
|
Packit |
57a33d |
* multibyte.c: is_valid_utf7() [ctype.h]
|
|
Packit |
57a33d |
* multibyte.c: is_valid_utf8() [ctype.h]
|
|
Packit |
57a33d |
|
|
Packit |
57a33d |
Some probably can't conflict.
|
|
Packit |
57a33d |
|