|
Packit |
745572 |
=head1 NAME
|
|
Packit |
745572 |
|
|
Packit |
745572 |
perlfilter - Source Filters
|
|
Packit |
745572 |
|
|
Packit |
745572 |
=head1 DESCRIPTION
|
|
Packit |
745572 |
|
|
Packit |
745572 |
This article is about a little-known feature of Perl called
|
|
Packit |
745572 |
I<source filters>. Source filters alter the program text of a module
|
|
Packit |
745572 |
before Perl sees it, much as a C preprocessor alters the source text of
|
|
Packit |
745572 |
a C program before the compiler sees it. This article tells you more
|
|
Packit |
745572 |
about what source filters are, how they work, and how to write your
|
|
Packit |
745572 |
own.
|
|
Packit |
745572 |
|
|
Packit |
745572 |
The original purpose of source filters was to let you encrypt your
|
|
Packit |
745572 |
program source to prevent casual piracy. This isn't all they can do, as
|
|
Packit |
745572 |
you'll soon learn. But first, the basics.
|
|
Packit |
745572 |
|
|
Packit |
745572 |
=head1 CONCEPTS
|
|
Packit |
745572 |
|
|
Packit |
745572 |
Before the Perl interpreter can execute a Perl script, it must first
|
|
Packit |
745572 |
read it from a file into memory for parsing and compilation. If that
|
|
Packit |
745572 |
script itself includes other scripts with a C<use> or C<require>
|
|
Packit |
745572 |
statement, then each of those scripts will have to be read from their
|
|
Packit |
745572 |
respective files as well.
|
|
Packit |
745572 |
|
|
Packit |
745572 |
Now think of each logical connection between the Perl parser and an
|
|
Packit |
745572 |
individual file as a I<source stream>. A source stream is created when
|
|
Packit |
745572 |
the Perl parser opens a file, it continues to exist as the source code
|
|
Packit |
745572 |
is read into memory, and it is destroyed when Perl is finished parsing
|
|
Packit |
745572 |
the file. If the parser encounters a C<require> or C<use> statement in
|
|
Packit |
745572 |
a source stream, a new and distinct stream is created just for that
|
|
Packit |
745572 |
file.
|
|
Packit |
745572 |
|
|
Packit |
745572 |
The diagram below represents a single source stream, with the flow of
|
|
Packit |
745572 |
source from a Perl script file on the left into the Perl parser on the
|
|
Packit |
745572 |
right. This is how Perl normally operates.
|
|
Packit |
745572 |
|
|
Packit |
745572 |
file -------> parser
|
|
Packit |
745572 |
|
|
Packit |
745572 |
There are two important points to remember:
|
|
Packit |
745572 |
|
|
Packit |
745572 |
=over 5
|
|
Packit |
745572 |
|
|
Packit |
745572 |
=item 1.
|
|
Packit |
745572 |
|
|
Packit |
745572 |
Although there can be any number of source streams in existence at any
|
|
Packit |
745572 |
given time, only one will be active.
|
|
Packit |
745572 |
|
|
Packit |
745572 |
=item 2.
|
|
Packit |
745572 |
|
|
Packit |
745572 |
Every source stream is associated with only one file.
|
|
Packit |
745572 |
|
|
Packit |
745572 |
=back
|
|
Packit |
745572 |
|
|
Packit |
745572 |
A source filter is a special kind of Perl module that intercepts and
|
|
Packit |
745572 |
modifies a source stream before it reaches the parser. A source filter
|
|
Packit |
745572 |
changes our diagram like this:
|
|
Packit |
745572 |
|
|
Packit |
745572 |
file ----> filter ----> parser
|
|
Packit |
745572 |
|
|
Packit |
745572 |
If that doesn't make much sense, consider the analogy of a command
|
|
Packit |
745572 |
pipeline. Say you have a shell script stored in the compressed file
|
|
Packit |
745572 |
I<trial.gz>. The simple pipeline command below runs the script without
|
|
Packit |
745572 |
needing to create a temporary file to hold the uncompressed file.
|
|
Packit |
745572 |
|
|
Packit |
745572 |
gunzip -c trial.gz | sh
|
|
Packit |
745572 |
|
|
Packit |
745572 |
In this case, the data flow from the pipeline can be represented as follows:
|
|
Packit |
745572 |
|
|
Packit |
745572 |
trial.gz ----> gunzip ----> sh
|
|
Packit |
745572 |
|
|
Packit |
745572 |
With source filters, you can store the text of your script compressed and use a source filter to uncompress it for Perl's parser:
|
|
Packit |
745572 |
|
|
Packit |
745572 |
compressed gunzip
|
|
Packit |
745572 |
Perl program ---> source filter ---> parser
|
|
Packit |
745572 |
|
|
Packit |
745572 |
=head1 USING FILTERS
|
|
Packit |
745572 |
|
|
Packit |
745572 |
So how do you use a source filter in a Perl script? Above, I said that
|
|
Packit |
745572 |
a source filter is just a special kind of module. Like all Perl
|
|
Packit |
745572 |
modules, a source filter is invoked with a use statement.
|
|
Packit |
745572 |
|
|
Packit |
745572 |
Say you want to pass your Perl source through the C preprocessor before
|
|
Packit |
745572 |
execution. As it happens, the source filters distribution comes with a C
|
|
Packit |
745572 |
preprocessor filter module called Filter::cpp.
|
|
Packit |
745572 |
|
|
Packit |
745572 |
Below is an example program, C<cpp_test>, which makes use of this filter.
|
|
Packit |
745572 |
Line numbers have been added to allow specific lines to be referenced
|
|
Packit |
745572 |
easily.
|
|
Packit |
745572 |
|
|
Packit |
745572 |
1: use Filter::cpp;
|
|
Packit |
745572 |
2: #define TRUE 1
|
|
Packit |
745572 |
3: $a = TRUE;
|
|
Packit |
745572 |
4: print "a = $a\n";
|
|
Packit |
745572 |
|
|
Packit |
745572 |
When you execute this script, Perl creates a source stream for the
|
|
Packit |
745572 |
file. Before the parser processes any of the lines from the file, the
|
|
Packit |
745572 |
source stream looks like this:
|
|
Packit |
745572 |
|
|
Packit |
745572 |
cpp_test ---------> parser
|
|
Packit |
745572 |
|
|
Packit |
745572 |
Line 1, C<use Filter::cpp>, includes and installs the C<cpp> filter
|
|
Packit |
745572 |
module. All source filters work this way. The use statement is compiled
|
|
Packit |
745572 |
and executed at compile time, before any more of the file is read, and
|
|
Packit |
745572 |
it attaches the cpp filter to the source stream behind the scenes. Now
|
|
Packit |
745572 |
the data flow looks like this:
|
|
Packit |
745572 |
|
|
Packit |
745572 |
cpp_test ----> cpp filter ----> parser
|
|
Packit |
745572 |
|
|
Packit |
745572 |
As the parser reads the second and subsequent lines from the source
|
|
Packit |
745572 |
stream, it feeds those lines through the C<cpp> source filter before
|
|
Packit |
745572 |
processing them. The C<cpp> filter simply passes each line through the
|
|
Packit |
745572 |
real C preprocessor. The output from the C preprocessor is then
|
|
Packit |
745572 |
inserted back into the source stream by the filter.
|
|
Packit |
745572 |
|
|
Packit |
745572 |
.-> cpp --.
|
|
Packit |
745572 |
| |
|
|
Packit |
745572 |
| |
|
|
Packit |
745572 |
| <-'
|
|
Packit |
745572 |
cpp_test ----> cpp filter ----> parser
|
|
Packit |
745572 |
|
|
Packit |
745572 |
The parser then sees the following code:
|
|
Packit |
745572 |
|
|
Packit |
745572 |
use Filter::cpp;
|
|
Packit |
745572 |
$a = 1;
|
|
Packit |
745572 |
print "a = $a\n";
|
|
Packit |
745572 |
|
|
Packit |
745572 |
Let's consider what happens when the filtered code includes another
|
|
Packit |
745572 |
module with use:
|
|
Packit |
745572 |
|
|
Packit |
745572 |
1: use Filter::cpp;
|
|
Packit |
745572 |
2: #define TRUE 1
|
|
Packit |
745572 |
3: use Fred;
|
|
Packit |
745572 |
4: $a = TRUE;
|
|
Packit |
745572 |
5: print "a = $a\n";
|
|
Packit |
745572 |
|
|
Packit |
745572 |
The C<cpp> filter does not apply to the text of the Fred module, only
|
|
Packit |
745572 |
to the text of the file that used it (C<cpp_test>). Although the use
|
|
Packit |
745572 |
statement on line 3 will pass through the cpp filter, the module that
|
|
Packit |
745572 |
gets included (C<Fred>) will not. The source streams look like this
|
|
Packit |
745572 |
after line 3 has been parsed and before line 4 is parsed:
|
|
Packit |
745572 |
|
|
Packit |
745572 |
cpp_test ---> cpp filter ---> parser (INACTIVE)
|
|
Packit |
745572 |
|
|
Packit |
745572 |
Fred.pm ----> parser
|
|
Packit |
745572 |
|
|
Packit |
745572 |
As you can see, a new stream has been created for reading the source
|
|
Packit |
745572 |
from C<Fred.pm>. This stream will remain active until all of C<Fred.pm>
|
|
Packit |
745572 |
has been parsed. The source stream for C<cpp_test> will still exist,
|
|
Packit |
745572 |
but is inactive. Once the parser has finished reading Fred.pm, the
|
|
Packit |
745572 |
source stream associated with it will be destroyed. The source stream
|
|
Packit |
745572 |
for C<cpp_test> then becomes active again and the parser reads line 4
|
|
Packit |
745572 |
and subsequent lines from C<cpp_test>.
|
|
Packit |
745572 |
|
|
Packit |
745572 |
You can use more than one source filter on a single file. Similarly,
|
|
Packit |
745572 |
you can reuse the same filter in as many files as you like.
|
|
Packit |
745572 |
|
|
Packit |
745572 |
For example, if you have a uuencoded and compressed source file, it is
|
|
Packit |
745572 |
possible to stack a uudecode filter and an uncompression filter like
|
|
Packit |
745572 |
this:
|
|
Packit |
745572 |
|
|
Packit |
745572 |
use Filter::uudecode; use Filter::uncompress;
|
|
Packit |
745572 |
M'XL(".H<US4''V9I;F%L')Q;>7/;1I;_>_I3=&E=%:F*I"T?22Q/
|
|
Packit |
745572 |
M6]9*<IQCO*XFT"0[PL%%'Y+IG?WN^ZYN-$'J.[.JE$,20/?K=_[>
|
|
Packit |
745572 |
...
|
|
Packit |
745572 |
|
|
Packit |
745572 |
Once the first line has been processed, the flow will look like this:
|
|
Packit |
745572 |
|
|
Packit |
745572 |
file ---> uudecode ---> uncompress ---> parser
|
|
Packit |
745572 |
filter filter
|
|
Packit |
745572 |
|
|
Packit |
745572 |
Data flows through filters in the same order they appear in the source
|
|
Packit |
745572 |
file. The uudecode filter appeared before the uncompress filter, so the
|
|
Packit |
745572 |
source file will be uudecoded before it's uncompressed.
|
|
Packit |
745572 |
|
|
Packit |
745572 |
=head1 WRITING A SOURCE FILTER
|
|
Packit |
745572 |
|
|
Packit |
745572 |
There are three ways to write your own source filter. You can write it
|
|
Packit |
745572 |
in C, use an external program as a filter, or write the filter in Perl.
|
|
Packit |
745572 |
I won't cover the first two in any great detail, so I'll get them out
|
|
Packit |
745572 |
of the way first. Writing the filter in Perl is most convenient, so
|
|
Packit |
745572 |
I'll devote the most space to it.
|
|
Packit |
745572 |
|
|
Packit |
745572 |
=head1 WRITING A SOURCE FILTER IN C
|
|
Packit |
745572 |
|
|
Packit |
745572 |
The first of the three available techniques is to write the filter
|
|
Packit |
745572 |
completely in C. The external module you create interfaces directly
|
|
Packit |
745572 |
with the source filter hooks provided by Perl.
|
|
Packit |
745572 |
|
|
Packit |
745572 |
The advantage of this technique is that you have complete control over
|
|
Packit |
745572 |
the implementation of your filter. The big disadvantage is the
|
|
Packit |
745572 |
increased complexity required to write the filter - not only do you
|
|
Packit |
745572 |
need to understand the source filter hooks, but you also need a
|
|
Packit |
745572 |
reasonable knowledge of Perl guts. One of the few times it is worth
|
|
Packit |
745572 |
going to this trouble is when writing a source scrambler. The
|
|
Packit |
745572 |
C<decrypt> filter (which unscrambles the source before Perl parses it)
|
|
Packit |
745572 |
included with the source filter distribution is an example of a C
|
|
Packit |
745572 |
source filter (see Decryption Filters, below).
|
|
Packit |
745572 |
|
|
Packit |
745572 |
|
|
Packit |
745572 |
=over 5
|
|
Packit |
745572 |
|
|
Packit |
745572 |
=item B<Decryption Filters>
|
|
Packit |
745572 |
|
|
Packit |
745572 |
All decryption filters work on the principle of "security through
|
|
Packit |
745572 |
obscurity." Regardless of how well you write a decryption filter and
|
|
Packit |
745572 |
how strong your encryption algorithm is, anyone determined enough can
|
|
Packit |
745572 |
retrieve the original source code. The reason is quite simple - once
|
|
Packit |
745572 |
the decryption filter has decrypted the source back to its original
|
|
Packit |
745572 |
form, fragments of it will be stored in the computer's memory as Perl
|
|
Packit |
745572 |
parses it. The source might only be in memory for a short period of
|
|
Packit |
745572 |
time, but anyone possessing a debugger, skill, and lots of patience can
|
|
Packit |
745572 |
eventually reconstruct your program.
|
|
Packit |
745572 |
|
|
Packit |
745572 |
That said, there are a number of steps that can be taken to make life
|
|
Packit |
745572 |
difficult for the potential cracker. The most important: Write your
|
|
Packit |
745572 |
decryption filter in C and statically link the decryption module into
|
|
Packit |
745572 |
the Perl binary. For further tips to make life difficult for the
|
|
Packit |
745572 |
potential cracker, see the file I<decrypt.pm> in the source filters
|
|
Packit |
745572 |
distribution.
|
|
Packit |
745572 |
|
|
Packit |
745572 |
=back
|
|
Packit |
745572 |
|
|
Packit |
745572 |
=head1 CREATING A SOURCE FILTER AS A SEPARATE EXECUTABLE
|
|
Packit |
745572 |
|
|
Packit |
745572 |
An alternative to writing the filter in C is to create a separate
|
|
Packit |
745572 |
executable in the language of your choice. The separate executable
|
|
Packit |
745572 |
reads from standard input, does whatever processing is necessary, and
|
|
Packit |
745572 |
writes the filtered data to standard output. C<Filter::cpp> is an
|
|
Packit |
745572 |
example of a source filter implemented as a separate executable - the
|
|
Packit |
745572 |
executable is the C preprocessor bundled with your C compiler.
|
|
Packit |
745572 |
|
|
Packit |
745572 |
The source filter distribution includes two modules that simplify this
|
|
Packit |
745572 |
task: C<Filter::exec> and C<Filter::sh>. Both allow you to run any
|
|
Packit |
745572 |
external executable. Both use a coprocess to control the flow of data
|
|
Packit |
745572 |
into and out of the external executable. (For details on coprocesses,
|
|
Packit |
745572 |
see Stephens, W.R., "Advanced Programming in the UNIX Environment."
|
|
Packit |
745572 |
Addison-Wesley, ISBN 0-210-56317-7, pages 441-445.) The difference
|
|
Packit |
745572 |
between them is that C<Filter::exec> spawns the external command
|
|
Packit |
745572 |
directly, while C<Filter::sh> spawns a shell to execute the external
|
|
Packit |
745572 |
command. (Unix uses the Bourne shell; NT uses the cmd shell.) Spawning
|
|
Packit |
745572 |
a shell allows you to make use of the shell metacharacters and
|
|
Packit |
745572 |
redirection facilities.
|
|
Packit |
745572 |
|
|
Packit |
745572 |
Here is an example script that uses C<Filter::sh>:
|
|
Packit |
745572 |
|
|
Packit |
745572 |
use Filter::sh 'tr XYZ PQR';
|
|
Packit |
745572 |
$a = 1;
|
|
Packit |
745572 |
print "XYZ a = $a\n";
|
|
Packit |
745572 |
|
|
Packit |
745572 |
The output you'll get when the script is executed:
|
|
Packit |
745572 |
|
|
Packit |
745572 |
PQR a = 1
|
|
Packit |
745572 |
|
|
Packit |
745572 |
Writing a source filter as a separate executable works fine, but a
|
|
Packit |
745572 |
small performance penalty is incurred. For example, if you execute the
|
|
Packit |
745572 |
small example above, a separate subprocess will be created to run the
|
|
Packit |
745572 |
Unix C command. Each use of the filter requires its own subprocess.
|
|
Packit |
745572 |
If creating subprocesses is expensive on your system, you might want to
|
|
Packit |
745572 |
consider one of the other options for creating source filters.
|
|
Packit |
745572 |
|
|
Packit |
745572 |
=head1 WRITING A SOURCE FILTER IN PERL
|
|
Packit |
745572 |
|
|
Packit |
745572 |
The easiest and most portable option available for creating your own
|
|
Packit |
745572 |
source filter is to write it completely in Perl. To distinguish this
|
|
Packit |
745572 |
from the previous two techniques, I'll call it a Perl source filter.
|
|
Packit |
745572 |
|
|
Packit |
745572 |
To help understand how to write a Perl source filter we need an example
|
|
Packit |
745572 |
to study. Here is a complete source filter that performs rot13
|
|
Packit |
745572 |
decoding. (Rot13 is a very simple encryption scheme used in Usenet
|
|
Packit |
745572 |
postings to hide the contents of offensive posts. It moves every letter
|
|
Packit |
745572 |
forward thirteen places, so that A becomes N, B becomes O, and Z
|
|
Packit |
745572 |
becomes M.)
|
|
Packit |
745572 |
|
|
Packit |
745572 |
|
|
Packit |
745572 |
package Rot13;
|
|
Packit |
745572 |
|
|
Packit |
745572 |
use Filter::Util::Call;
|
|
Packit |
745572 |
|
|
Packit |
745572 |
sub import {
|
|
Packit |
745572 |
my ($type) = @_;
|
|
Packit |
745572 |
my ($ref) = [];
|
|
Packit |
745572 |
filter_add(bless $ref);
|
|
Packit |
745572 |
}
|
|
Packit |
745572 |
|
|
Packit |
745572 |
sub filter {
|
|
Packit |
745572 |
my ($self) = @_;
|
|
Packit |
745572 |
my ($status);
|
|
Packit |
745572 |
|
|
Packit |
745572 |
tr/n-za-mN-ZA-M/a-zA-Z/
|
|
Packit |
745572 |
if ($status = filter_read()) > 0;
|
|
Packit |
745572 |
$status;
|
|
Packit |
745572 |
}
|
|
Packit |
745572 |
|
|
Packit |
745572 |
1;
|
|
Packit |
745572 |
|
|
Packit |
745572 |
All Perl source filters are implemented as Perl classes and have the
|
|
Packit |
745572 |
same basic structure as the example above.
|
|
Packit |
745572 |
|
|
Packit |
745572 |
First, we include the C<Filter::Util::Call> module, which exports a
|
|
Packit |
745572 |
number of functions into your filter's namespace. The filter shown
|
|
Packit |
745572 |
above uses two of these functions, C<filter_add()> and
|
|
Packit |
745572 |
C<filter_read()>.
|
|
Packit |
745572 |
|
|
Packit |
745572 |
Next, we create the filter object and associate it with the source
|
|
Packit |
745572 |
stream by defining the C<import> function. If you know Perl well
|
|
Packit |
745572 |
enough, you know that C<import> is called automatically every time a
|
|
Packit |
745572 |
module is included with a use statement. This makes C<import> the ideal
|
|
Packit |
745572 |
place to both create and install a filter object.
|
|
Packit |
745572 |
|
|
Packit |
745572 |
In the example filter, the object (C<$ref>) is blessed just like any
|
|
Packit |
745572 |
other Perl object. Our example uses an anonymous array, but this isn't
|
|
Packit |
745572 |
a requirement. Because this example doesn't need to store any context
|
|
Packit |
745572 |
information, we could have used a scalar or hash reference just as
|
|
Packit |
745572 |
well. The next section demonstrates context data.
|
|
Packit |
745572 |
|
|
Packit |
745572 |
The association between the filter object and the source stream is made
|
|
Packit |
745572 |
with the C<filter_add()> function. This takes a filter object as a
|
|
Packit |
745572 |
parameter (C<$ref> in this case) and installs it in the source stream.
|
|
Packit |
745572 |
|
|
Packit |
745572 |
Finally, there is the code that actually does the filtering. For this
|
|
Packit |
745572 |
type of Perl source filter, all the filtering is done in a method
|
|
Packit |
745572 |
called C<filter()>. (It is also possible to write a Perl source filter
|
|
Packit |
745572 |
using a closure. See the C<Filter::Util::Call> manual page for more
|
|
Packit |
745572 |
details.) It's called every time the Perl parser needs another line of
|
|
Packit |
745572 |
source to process. The C<filter()> method, in turn, reads lines from
|
|
Packit |
745572 |
the source stream using the C<filter_read()> function.
|
|
Packit |
745572 |
|
|
Packit |
745572 |
If a line was available from the source stream, C<filter_read()>
|
|
Packit |
745572 |
returns a status value greater than zero and appends the line to C<$_>.
|
|
Packit |
745572 |
A status value of zero indicates end-of-file, less than zero means an
|
|
Packit |
745572 |
error. The filter function itself is expected to return its status in
|
|
Packit |
745572 |
the same way, and put the filtered line it wants written to the source
|
|
Packit |
745572 |
stream in C<$_>. The use of C<$_> accounts for the brevity of most Perl
|
|
Packit |
745572 |
source filters.
|
|
Packit |
745572 |
|
|
Packit |
745572 |
In order to make use of the rot13 filter we need some way of encoding
|
|
Packit |
745572 |
the source file in rot13 format. The script below, C<mkrot13>, does
|
|
Packit |
745572 |
just that.
|
|
Packit |
745572 |
|
|
Packit |
745572 |
die "usage mkrot13 filename\n" unless @ARGV;
|
|
Packit |
745572 |
my $in = $ARGV[0];
|
|
Packit |
745572 |
my $out = "$in.tmp";
|
|
Packit |
745572 |
open(IN, "<$in") or die "Cannot open file $in: $!\n";
|
|
Packit |
745572 |
open(OUT, ">$out") or die "Cannot open file $out: $!\n";
|
|
Packit |
745572 |
|
|
Packit |
745572 |
print OUT "use Rot13;\n";
|
|
Packit |
745572 |
while (<IN>) {
|
|
Packit |
745572 |
tr/a-zA-Z/n-za-mN-ZA-M/;
|
|
Packit |
745572 |
print OUT;
|
|
Packit |
745572 |
}
|
|
Packit |
745572 |
|
|
Packit |
745572 |
close IN;
|
|
Packit |
745572 |
close OUT;
|
|
Packit |
745572 |
unlink $in;
|
|
Packit |
745572 |
rename $out, $in;
|
|
Packit |
745572 |
|
|
Packit |
745572 |
If we encrypt this with C<mkrot13>:
|
|
Packit |
745572 |
|
|
Packit |
745572 |
print " hello fred \n";
|
|
Packit |
745572 |
|
|
Packit |
745572 |
the result will be this:
|
|
Packit |
745572 |
|
|
Packit |
745572 |
use Rot13;
|
|
Packit |
745572 |
cevag "uryyb serq\a";
|
|
Packit |
745572 |
|
|
Packit |
745572 |
Running it produces this output:
|
|
Packit |
745572 |
|
|
Packit |
745572 |
hello fred
|
|
Packit |
745572 |
|
|
Packit |
745572 |
=head1 USING CONTEXT: THE DEBUG FILTER
|
|
Packit |
745572 |
|
|
Packit |
745572 |
The rot13 example was a trivial example. Here's another demonstration
|
|
Packit |
745572 |
that shows off a few more features.
|
|
Packit |
745572 |
|
|
Packit |
745572 |
Say you wanted to include a lot of debugging code in your Perl script
|
|
Packit |
745572 |
during development, but you didn't want it available in the released
|
|
Packit |
745572 |
product. Source filters offer a solution. In order to keep the example
|
|
Packit |
745572 |
simple, let's say you wanted the debugging output to be controlled by
|
|
Packit |
745572 |
an environment variable, C<DEBUG>. Debugging code is enabled if the
|
|
Packit |
745572 |
variable exists, otherwise it is disabled.
|
|
Packit |
745572 |
|
|
Packit |
745572 |
Two special marker lines will bracket debugging code, like this:
|
|
Packit |
745572 |
|
|
Packit |
745572 |
## DEBUG_BEGIN
|
|
Packit |
745572 |
if ($year > 1999) {
|
|
Packit |
745572 |
warn "Debug: millennium bug in year $year\n";
|
|
Packit |
745572 |
}
|
|
Packit |
745572 |
## DEBUG_END
|
|
Packit |
745572 |
|
|
Packit |
745572 |
The filter ensures that Perl parses the code between the <DEBUG_BEGIN>
|
|
Packit |
745572 |
and C<DEBUG_END> markers only when the C<DEBUG> environment variable
|
|
Packit |
745572 |
exists. That means that when C<DEBUG> does exist, the code above
|
|
Packit |
745572 |
should be passed through the filter unchanged. The marker lines can
|
|
Packit |
745572 |
also be passed through as-is, because the Perl parser will see them as
|
|
Packit |
745572 |
comment lines. When C<DEBUG> isn't set, we need a way to disable the
|
|
Packit |
745572 |
debug code. A simple way to achieve that is to convert the lines
|
|
Packit |
745572 |
between the two markers into comments:
|
|
Packit |
745572 |
|
|
Packit |
745572 |
## DEBUG_BEGIN
|
|
Packit |
745572 |
#if ($year > 1999) {
|
|
Packit |
745572 |
# warn "Debug: millennium bug in year $year\n";
|
|
Packit |
745572 |
#}
|
|
Packit |
745572 |
## DEBUG_END
|
|
Packit |
745572 |
|
|
Packit |
745572 |
Here is the complete Debug filter:
|
|
Packit |
745572 |
|
|
Packit |
745572 |
package Debug;
|
|
Packit |
745572 |
|
|
Packit |
745572 |
use strict;
|
|
Packit |
745572 |
use warnings;
|
|
Packit |
745572 |
use Filter::Util::Call;
|
|
Packit |
745572 |
|
|
Packit |
745572 |
use constant TRUE => 1;
|
|
Packit |
745572 |
use constant FALSE => 0;
|
|
Packit |
745572 |
|
|
Packit |
745572 |
sub import {
|
|
Packit |
745572 |
my ($type) = @_;
|
|
Packit |
745572 |
my (%context) = (
|
|
Packit |
745572 |
Enabled => defined $ENV{DEBUG},
|
|
Packit |
745572 |
InTraceBlock => FALSE,
|
|
Packit |
745572 |
Filename => (caller)[1],
|
|
Packit |
745572 |
LineNo => 0,
|
|
Packit |
745572 |
LastBegin => 0,
|
|
Packit |
745572 |
);
|
|
Packit |
745572 |
filter_add(bless \%context);
|
|
Packit |
745572 |
}
|
|
Packit |
745572 |
|
|
Packit |
745572 |
sub Die {
|
|
Packit |
745572 |
my ($self) = shift;
|
|
Packit |
745572 |
my ($message) = shift;
|
|
Packit |
745572 |
my ($line_no) = shift || $self->{LastBegin};
|
|
Packit |
745572 |
die "$message at $self->{Filename} line $line_no.\n"
|
|
Packit |
745572 |
}
|
|
Packit |
745572 |
|
|
Packit |
745572 |
sub filter {
|
|
Packit |
745572 |
my ($self) = @_;
|
|
Packit |
745572 |
my ($status);
|
|
Packit |
745572 |
$status = filter_read();
|
|
Packit |
745572 |
++ $self->{LineNo};
|
|
Packit |
745572 |
|
|
Packit |
745572 |
# deal with EOF/error first
|
|
Packit |
745572 |
if ($status <= 0) {
|
|
Packit |
745572 |
$self->Die("DEBUG_BEGIN has no DEBUG_END")
|
|
Packit |
745572 |
if $self->{InTraceBlock};
|
|
Packit |
745572 |
return $status;
|
|
Packit |
745572 |
}
|
|
Packit |
745572 |
|
|
Packit |
745572 |
if ($self->{InTraceBlock}) {
|
|
Packit |
745572 |
if (/^\s*##\s*DEBUG_BEGIN/ ) {
|
|
Packit |
745572 |
$self->Die("Nested DEBUG_BEGIN", $self->{LineNo})
|
|
Packit |
745572 |
} elsif (/^\s*##\s*DEBUG_END/) {
|
|
Packit |
745572 |
$self->{InTraceBlock} = FALSE;
|
|
Packit |
745572 |
}
|
|
Packit |
745572 |
|
|
Packit |
745572 |
# comment out the debug lines when the filter is disabled
|
|
Packit |
745572 |
s/^/#/ if ! $self->{Enabled};
|
|
Packit |
745572 |
} elsif ( /^\s*##\s*DEBUG_BEGIN/ ) {
|
|
Packit |
745572 |
$self->{InTraceBlock} = TRUE;
|
|
Packit |
745572 |
$self->{LastBegin} = $self->{LineNo};
|
|
Packit |
745572 |
} elsif ( /^\s*##\s*DEBUG_END/ ) {
|
|
Packit |
745572 |
$self->Die("DEBUG_END has no DEBUG_BEGIN", $self->{LineNo});
|
|
Packit |
745572 |
}
|
|
Packit |
745572 |
return $status;
|
|
Packit |
745572 |
}
|
|
Packit |
745572 |
|
|
Packit |
745572 |
1;
|
|
Packit |
745572 |
|
|
Packit |
745572 |
The big difference between this filter and the previous example is the
|
|
Packit |
745572 |
use of context data in the filter object. The filter object is based on
|
|
Packit |
745572 |
a hash reference, and is used to keep various pieces of context
|
|
Packit |
745572 |
information between calls to the filter function. All but two of the
|
|
Packit |
745572 |
hash fields are used for error reporting. The first of those two,
|
|
Packit |
745572 |
Enabled, is used by the filter to determine whether the debugging code
|
|
Packit |
745572 |
should be given to the Perl parser. The second, InTraceBlock, is true
|
|
Packit |
745572 |
when the filter has encountered a C<DEBUG_BEGIN> line, but has not yet
|
|
Packit |
745572 |
encountered the following C<DEBUG_END> line.
|
|
Packit |
745572 |
|
|
Packit |
745572 |
If you ignore all the error checking that most of the code does, the
|
|
Packit |
745572 |
essence of the filter is as follows:
|
|
Packit |
745572 |
|
|
Packit |
745572 |
sub filter {
|
|
Packit |
745572 |
my ($self) = @_;
|
|
Packit |
745572 |
my ($status);
|
|
Packit |
745572 |
$status = filter_read();
|
|
Packit |
745572 |
|
|
Packit |
745572 |
# deal with EOF/error first
|
|
Packit |
745572 |
return $status if $status <= 0;
|
|
Packit |
745572 |
if ($self->{InTraceBlock}) {
|
|
Packit |
745572 |
if (/^\s*##\s*DEBUG_END/) {
|
|
Packit |
745572 |
$self->{InTraceBlock} = FALSE
|
|
Packit |
745572 |
}
|
|
Packit |
745572 |
|
|
Packit |
745572 |
# comment out debug lines when the filter is disabled
|
|
Packit |
745572 |
s/^/#/ if ! $self->{Enabled};
|
|
Packit |
745572 |
} elsif ( /^\s*##\s*DEBUG_BEGIN/ ) {
|
|
Packit |
745572 |
$self->{InTraceBlock} = TRUE;
|
|
Packit |
745572 |
}
|
|
Packit |
745572 |
return $status;
|
|
Packit |
745572 |
}
|
|
Packit |
745572 |
|
|
Packit |
745572 |
Be warned: just as the C-preprocessor doesn't know C, the Debug filter
|
|
Packit |
745572 |
doesn't know Perl. It can be fooled quite easily:
|
|
Packit |
745572 |
|
|
Packit |
745572 |
print <
|
|
Packit |
745572 |
##DEBUG_BEGIN
|
|
Packit |
745572 |
EOM
|
|
Packit |
745572 |
|
|
Packit |
745572 |
Such things aside, you can see that a lot can be achieved with a modest
|
|
Packit |
745572 |
amount of code.
|
|
Packit |
745572 |
|
|
Packit |
745572 |
=head1 CONCLUSION
|
|
Packit |
745572 |
|
|
Packit |
745572 |
You now have better understanding of what a source filter is, and you
|
|
Packit |
745572 |
might even have a possible use for them. If you feel like playing with
|
|
Packit |
745572 |
source filters but need a bit of inspiration, here are some extra
|
|
Packit |
745572 |
features you could add to the Debug filter.
|
|
Packit |
745572 |
|
|
Packit |
745572 |
First, an easy one. Rather than having debugging code that is
|
|
Packit |
745572 |
all-or-nothing, it would be much more useful to be able to control
|
|
Packit |
745572 |
which specific blocks of debugging code get included. Try extending the
|
|
Packit |
745572 |
syntax for debug blocks to allow each to be identified. The contents of
|
|
Packit |
745572 |
the C<DEBUG> environment variable can then be used to control which
|
|
Packit |
745572 |
blocks get included.
|
|
Packit |
745572 |
|
|
Packit |
745572 |
Once you can identify individual blocks, try allowing them to be
|
|
Packit |
745572 |
nested. That isn't difficult either.
|
|
Packit |
745572 |
|
|
Packit |
745572 |
Here is an interesting idea that doesn't involve the Debug filter.
|
|
Packit |
745572 |
Currently Perl subroutines have fairly limited support for formal
|
|
Packit |
745572 |
parameter lists. You can specify the number of parameters and their
|
|
Packit |
745572 |
type, but you still have to manually take them out of the C<@_> array
|
|
Packit |
745572 |
yourself. Write a source filter that allows you to have a named
|
|
Packit |
745572 |
parameter list. Such a filter would turn this:
|
|
Packit |
745572 |
|
|
Packit |
745572 |
sub MySub ($first, $second, @rest) { ... }
|
|
Packit |
745572 |
|
|
Packit |
745572 |
into this:
|
|
Packit |
745572 |
|
|
Packit |
745572 |
sub MySub($$@) {
|
|
Packit |
745572 |
my ($first) = shift;
|
|
Packit |
745572 |
my ($second) = shift;
|
|
Packit |
745572 |
my (@rest) = @_;
|
|
Packit |
745572 |
...
|
|
Packit |
745572 |
}
|
|
Packit |
745572 |
|
|
Packit |
745572 |
Finally, if you feel like a real challenge, have a go at writing a
|
|
Packit |
745572 |
full-blown Perl macro preprocessor as a source filter. Borrow the
|
|
Packit |
745572 |
useful features from the C preprocessor and any other macro processors
|
|
Packit |
745572 |
you know. The tricky bit will be choosing how much knowledge of Perl's
|
|
Packit |
745572 |
syntax you want your filter to have.
|
|
Packit |
745572 |
|
|
Packit |
745572 |
=head1 LIMITATIONS
|
|
Packit |
745572 |
|
|
Packit |
745572 |
Source filters only work on the string level, thus are highly limited
|
|
Packit |
745572 |
in its ability to change source code on the fly. It cannot detect
|
|
Packit |
745572 |
comments, quoted strings, heredocs, it is no replacement for a real
|
|
Packit |
745572 |
parser.
|
|
Packit |
745572 |
The only stable usage for source filters are encryption, compression,
|
|
Packit |
745572 |
or the byteloader, to translate binary code back to source code.
|
|
Packit |
745572 |
|
|
Packit |
745572 |
See for example the limitations in L<Switch>, which uses source filters,
|
|
Packit |
745572 |
and thus is does not work inside a string eval, the presence of
|
|
Packit |
745572 |
regexes with embedded newlines that are specified with raw C</.../>
|
|
Packit |
745572 |
delimiters and don't have a modifier C<//x> are indistinguishable from
|
|
Packit |
745572 |
code chunks beginning with the division operator C. As a workaround
|
|
Packit |
745572 |
you must use C<m/.../> or C<m?...?> for such patterns. Also, the presence of
|
|
Packit |
745572 |
regexes specified with raw C delimiters may cause mysterious
|
|
Packit |
745572 |
errors. The workaround is to use C<m?...?> instead. See
|
|
Packit |
745572 |
L<http://search.cpan.org/perldoc?Switch#LIMITATIONS>
|
|
Packit |
745572 |
|
|
Packit |
745572 |
Currently the content of the C<__DATA__> block is not filtered.
|
|
Packit |
745572 |
|
|
Packit |
745572 |
Currently internal buffer lengths are limited to 32-bit only.
|
|
Packit |
745572 |
|
|
Packit |
745572 |
|
|
Packit |
745572 |
=head1 THINGS TO LOOK OUT FOR
|
|
Packit |
745572 |
|
|
Packit |
745572 |
=over 5
|
|
Packit |
745572 |
|
|
Packit |
745572 |
=item Some Filters Clobber the C<DATA> Handle
|
|
Packit |
745572 |
|
|
Packit |
745572 |
Some source filters use the C<DATA> handle to read the calling program.
|
|
Packit |
745572 |
When using these source filters you cannot rely on this handle, nor expect
|
|
Packit |
745572 |
any particular kind of behavior when operating on it. Filters based on
|
|
Packit |
745572 |
Filter::Util::Call (and therefore Filter::Simple) do not alter the C<DATA>
|
|
Packit |
745572 |
filehandle, but on the other hand totally ignore the text after C<__DATA__>.
|
|
Packit |
745572 |
|
|
Packit |
745572 |
=back
|
|
Packit |
745572 |
|
|
Packit |
745572 |
=head1 REQUIREMENTS
|
|
Packit |
745572 |
|
|
Packit |
745572 |
The Source Filters distribution is available on CPAN, in
|
|
Packit |
745572 |
|
|
Packit |
745572 |
CPAN/modules/by-module/Filter
|
|
Packit |
745572 |
|
|
Packit |
745572 |
Starting from Perl 5.8 Filter::Util::Call (the core part of the
|
|
Packit |
745572 |
Source Filters distribution) is part of the standard Perl distribution.
|
|
Packit |
745572 |
Also included is a friendlier interface called Filter::Simple, by
|
|
Packit |
745572 |
Damian Conway.
|
|
Packit |
745572 |
|
|
Packit |
745572 |
=head1 AUTHOR
|
|
Packit |
745572 |
|
|
Packit |
745572 |
Paul Marquess E<lt>Paul.Marquess@btinternet.comE<gt>
|
|
Packit |
745572 |
|
|
Packit |
745572 |
Reini Urban E<lt>rurban@cpan.orgE<gt>
|
|
Packit |
745572 |
|
|
Packit |
745572 |
=head1 Copyrights
|
|
Packit |
745572 |
|
|
Packit |
745572 |
The first version of this article originally appeared in The Perl
|
|
Packit |
745572 |
Journal #11, and is copyright 1998 The Perl Journal. It appears
|
|
Packit |
745572 |
courtesy of Jon Orwant and The Perl Journal. This document may be
|
|
Packit |
745572 |
distributed under the same terms as Perl itself.
|