Blame DEVEL.txt

Packit 6f700b
Information about dos2unix' implementation choices.
Packit 6f700b
Packit 6f700b
1. Smart conversion
Packit 6f700b
===================
Packit 6f700b
Packit 6f700b
  There are some dos2unix implementations that automatically convert all type of
Packit 6f700b
line breaks. For instance converting both DOS and Mac linebreaks to Unix line
Packit 6f700b
breaks at once. Or automatically detect the line break type and convert to the
Packit 6f700b
other side.
Packit 6f700b
Packit 6f700b
  Smart conversions could lead to unexpected behaviour. For instance when a
Packit 6f700b
dos2unix is run on a file with only Unix line breaks and the line breaks are
Packit 6f700b
flipped to the other side. This dos2unix implementation does exactly what you
Packit 6f700b
tell it to do. When you run 'dos2unix' only DOS line breaks are converted to
Packit 6f700b
Unix line breaks. Unix line breaks stay in the file. Seen from a DOS or Unix
Packit 6f700b
perspective, a Mac line break is not a line break, so also Mac line breaks stay
Packit 6f700b
untouched.  The same applies for mac2unix. Mac2unix leaves Unix and DOS line
Packit 6f700b
breaks untouched.
Packit 6f700b
Packit 6f700b
Packit 6f700b
2. Unix filter
Packit 6f700b
==============
Packit 6f700b
Packit 6f700b
  When a standard Unix filter, e.g. sed or tr, reads input from a file it sends
Packit 6f700b
its output by default to standard out. This implementation of dos2unix does by
Packit 6f700b
default in-place conversion (overwriting the input file), which seems not in line.
Packit 6f700b
Packit 6f700b
  Dos2unix is not part of the Unix standard. Most Unixes have their
Packit 6f700b
own implementation of dos2unix. There is a lot of variation in command names,
Packit 6f700b
options, and behavior. The SunOS version of dos2unix, after which this version was
Packit 6f700b
modeled, does by default paired conversion.
Packit 6f700b
  This implementation of dos2unix has too much legacy to change the current behaviour.
Packit 6f700b
Changing it would have more disadvantages than advantages. Most people expect
Packit 6f700b
dos2unix to do in-place conversion. The majority of other open source implementations
Packit 6f700b
also convert by default in-place. In-place conversion has the advantage that it is
Packit 6f700b
very easy to convert multiple files by using wild cards.
Packit 6f700b
  This implementation of dos2unix does send the output to standard-out when the
Packit 6f700b
input comes from standard-in. So you can use it as filter. Note that dos2unix/
Packit 6f700b
unix2dos is also used a lot on non-Unix operating systems where the filter idea
Packit 6f700b
is less known.
Packit 6f700b
Packit 6f700b
Packit 6f700b
3. Recursive conversion of files
Packit 6f700b
================================
Packit 6f700b
Packit 6f700b
  There are implementations that have builtin functionality to do recursive
Packit 6f700b
conversion of all files in a directory tree.
Packit 6f700b
Packit 6f700b
  This functionality is not needed in dos2unix. By using an external program,
Packit 6f700b
like Unix 'find', you can do recursive conversion of directory trees. There is
Packit 6f700b
no need to duplicate this.
Packit 6f700b
Packit 6f700b
Packit 6f700b
4. Encoding conversion
Packit 6f700b
======================
Packit 6f700b
Packit 6f700b
  Dos2unix can do several encoding conversions. First there are the conversions
Packit 6f700b
of several DOS code pages to and from ISO-8859-1. These conversions are also
Packit 6f700b
part of the SunOS dos2unix implementation after which this implementation has
Packit 6f700b
been modeled. Although these conversions are not much used these days they have
Packit 6f700b
been added for the sake completeness. Conversion of CP1252 was added, because
Packit 6f700b
it is used a lot in the Western world. It's almost identical to ISO-8859-1. There
Packit 6f700b
is no intention to add other conversions to and from ISO-8859-1.
Packit 6f700b
Packit 6f700b
  Conversion from UTF-16 was added, because the world is moving towards
Packit 6f700b
Unicode.  Microsoft Windows uses by default UTF-16 format for Unicode. UTF-16
Packit 6f700b
is part of Windows' core design for historical reasons. Microsoft standardized
Packit 6f700b
on UCS-2, a predecessor of UTF-16, in a time when UTF-8 did not exist yet.
Packit 6f700b
However a lot of Windows software is able to read UTF-8 files. In Windows
Packit 6f700b
"Unicode" means usually UTF-16. For instance saving a file with Notepad in
Packit 6f700b
"Unicode" encoding means in UTF-16 encoding. When you work in PowerShell and
Packit 6f700b
echo some text to a file you get an UTF-16 encoded text file. UTF-16 is there
Packit 6f700b
to stay, although many people would like to see otherwise and are dreaming of
Packit 6f700b
an UTF-8 only world. The Unix/Linux world is moving towards UTF-8 encoding,
Packit 6f700b
because it's backwards compatible with ASCII. Unix programs typically do not
Packit 6f700b
support UTF-16.
Packit 6f700b
Packit 6f700b
  One end of the encoding spectrum is an ASCII only world, where the only
Packit 6f700b
differences between DOS and Unix text files are line breaks. In English
Packit 6f700b
speaking regions this is a good working environment, because ASCII is in
Packit 6f700b
practice sufficient for English language. Diacritics are hardly used and can be
Packit 6f700b
omitted. The other end of the spectrum is an Unicode only world. All languages
Packit 6f700b
of the world are supported. Dos2unix aims to support these two ends of the
Packit 6f700b
spectrum: ASCII and Unicode. The Chinese GB18030 encoding is also seen as an
Packit 6f700b
Unicode transformation format. UTF-32 is not supported, because this is
Packit 6f700b
practically only used as an internal format.  Other encoding transformations
Packit 6f700b
are left to specialized programs like iconv and recode. The few conversion
Packit 6f700b
modes to and from ISO-8859-1 are only there for legacy reasons.
Packit 6f700b
Packit 6f700b
  In the ASCII days DOS to Unix text file conversion, and vice versa, was only
Packit 6f700b
converting line breaks.  In the Unicode era it is not only line break
Packit 6f700b
conversion, but also Unicode transformation format conversion (e.g. UTF-16 to
Packit 6f700b
UTF-8), and Byte Order Mark (BOM) removal or addition.
Packit 6f700b
Packit 6f700b
  Conversion towards UTF-16 is not supported and there is no intention to support
Packit 6f700b
it in the future. UTF-8 encoded files are well supported on Windows, so
Packit 6f700b
conversion to UTF-16 is not needed. And we keep on dreaming of an UTF-8 only
Packit 6f700b
world...