Blame tools/djvutxt.1

Packit df99a1
.\" Copyright (c) 2001-2003 Leon Bottou, Yann Le Cun, Patrick Haffner,
Packit df99a1
.\" Copyright (c) 2001 AT&T Corp., and Lizardtech, Inc.
Packit df99a1
.\"
Packit df99a1
.\" This is free documentation; you can redistribute it and/or
Packit df99a1
.\" modify it under the terms of the GNU General Public License as
Packit df99a1
.\" published by the Free Software Foundation; either version 2 of
Packit df99a1
.\" the License, or (at your option) any later version.
Packit df99a1
.\"
Packit df99a1
.\" The GNU General Public License's references to "object code"
Packit df99a1
.\" and "executables" are to be interpreted as the output of any
Packit df99a1
.\" document formatting or typesetting system, including
Packit df99a1
.\" intermediate and printed output.
Packit df99a1
.\"
Packit df99a1
.\" This manual is distributed in the hope that it will be useful,
Packit df99a1
.\" but WITHOUT ANY WARRANTY; without even the implied warranty of
Packit df99a1
.\" MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
Packit df99a1
.\" GNU General Public License for more details.
Packit df99a1
.\"
Packit df99a1
.\" You should have received a copy of the GNU General Public
Packit df99a1
.\" License along with this manual. Otherwise check the web site
Packit df99a1
.\" of the Free Software Foundation at http://www.fsf.org.
Packit df99a1
.TH DJVUTXT 1 "10/11/2001" "DjVuLibre-3.5" "DjVuLibre-3.5"
Packit df99a1
.de SS
Packit df99a1
.SH \\0\\0\\0\\$*
Packit df99a1
..
Packit df99a1
.SH NAME
Packit df99a1
djvutxt \- Extract the hidden text from DjVu documents.
Packit df99a1
Packit df99a1
.SH SYNOPSIS
Packit df99a1
.BI "djvutxt [" options "] " "inputdjvufile" " [" outputtxtfile "]"
Packit df99a1
Packit df99a1
.SH DESCRIPTION
Packit df99a1
Program 
Packit df99a1
.B djvutxt
Packit df99a1
decodes the hidden text layer of a DjVu document 
Packit df99a1
.I inputdjvufile
Packit df99a1
and prints it into file
Packit df99a1
.I outputtxtfile
Packit df99a1
or on the standard output.
Packit df99a1
The hidden text layer is usually generated with 
Packit df99a1
the help of an optical character recognition software.
Packit df99a1
Packit df99a1
Without options
Packit df99a1
.BR -detail
Packit df99a1
and
Packit df99a1
.BR -escape ,
Packit df99a1
this program simply outputs the UTF-8 text.
Packit df99a1
Option
Packit df99a1
.BR -detail
Packit df99a1
cause the output of S-expressions
Packit df99a1
describing the text and its location.
Packit df99a1
Option
Packit df99a1
.BR -escape
Packit df99a1
uses C-style escape sequences to represent
Packit df99a1
nonprintable non-ASCII characters.
Packit df99a1
Packit df99a1
Packit df99a1
Packit df99a1
.SH OPTIONS
Packit df99a1
.TP
Packit df99a1
.BI "--page=" "pagespec"
Packit df99a1
Specify which pages should be processed.
Packit df99a1
When this option is not specified,
Packit df99a1
the text of all pages of the documents is
Packit df99a1
concatenated into the output file.
Packit df99a1
The page specification
Packit df99a1
.I pagespec 
Packit df99a1
contains one or more comma-separated page ranges.
Packit df99a1
A page range is either a page number, 
Packit df99a1
or two page numbers separated by a dash.
Packit df99a1
For instance, specification
Packit df99a1
.BR "1-10" 
Packit df99a1
outputs pages 1 to 10, and specification
Packit df99a1
.BR "1,3,99999-4"
Packit df99a1
outputs pages 1 and 3, followed by all the document
Packit df99a1
pages in reverse order up to page 4.
Packit df99a1
.TP
Packit df99a1
.BI "--detail=" "keyword"
Packit df99a1
This options causes
Packit df99a1
.B djvutxt
Packit df99a1
to output S-expressions 
Packit df99a1
specifying the position of the text in the page.
Packit df99a1
See the manual page
Packit df99a1
.BR djvused (1)
Packit df99a1
for a description of the output format.
Packit df99a1
Argument 
Packit df99a1
.I keyword
Packit df99a1
specifies the maximum level of detail
Packit df99a1
for which text location is reported.
Packit df99a1
The recognized values are:
Packit df99a1
.BR page ", " column ", " region ", " para ", "
Packit df99a1
.BR line ", " word ", and " char "."
Packit df99a1
All other values are interpreted as 
Packit df99a1
.BR char .
Packit df99a1
.TP
Packit df99a1
.BI "--escape"
Packit df99a1
Output escape sequences of the form
Packit df99a1
.BI \ "ooo"
Packit df99a1
for all non ASCII or non printable UTF-8 
Packit df99a1
characters and for the backslash character.
Packit df99a1
Packit df99a1
Packit df99a1
Packit df99a1
Packit df99a1
.SH REMARKS
Packit df99a1
Use program
Packit df99a1
.BR djvused (1)
Packit df99a1
for more control over the text layer.
Packit df99a1
Packit df99a1
.SH CREDITS
Packit df99a1
This program was initially written by 
Packit df99a1
Andrei Erofeev <andrew_erofeev@yahoo.com> and
Packit df99a1
was then improved Bill Riemers <docbill@sourceforge.net> 
Packit df99a1
and many others. It was then rewritten to use the 
Packit df99a1
ddjvuapi by Leon Bottou <leonb@sourceforge.net>.
Packit df99a1
Packit df99a1
.SH SEE ALSO
Packit df99a1
.BR djvu (1),
Packit df99a1
.BR djvused (1)
Packit df99a1