The
Duct Tape of the Internet
Randal L. Schwartz
When you're a Perl programmer, you never fret about those
little ugly tasks that creep up. Perl can deal with file wrangling,
text manipulation, and process management in a way unequaled by
any other single language, whether open source or proprietary.
For example, in this column, I'll take a simple file and
text-wrangling task and show how I solved it with Perl. I was a
systems administrator for many years, and I'd say that this
task is representative of those niggling little things that I faced,
typically daily, in the course of my job.
Nearly all Perl modules contain embedded documentation, called
"POD" (described by perldoc perlpod). When I install
a module from the Comprehensive Perl Archive Network (the "CPAN":
see http://www.cpan.org for further information), the module
is usually installed into a place where my Perl binary can find
it (along Perl's @INC path). By default, the installation
process also creates an nroff -man page, so that the man
command can display a nicely formatted version (presuming you extend
your MANPATH or equivalent). Thus, for most modules, you
can say either perldoc Some::Module (to convert the embedded
POD into text) or man Some::Module (to display the preprocessed
man page).
However, the server that runs http://www.stonehenge.com
runs OpenBSD (mostly so I can sleep at night knowing that security
is a key point of the OpenBSD developers). The default Perl installation
of OpenBSD is configured in such a way that the man pages are not
generated for non-core Perl modules. I'm expected to type perldoc
Some::Module to get the documentation for the module, instead
of the more familiar man Some::Module; however, I can
use man for the core modules. Because I found this rather
confusing, I faced two alternatives:
1. I could hack the core installation of Perl so that it would
install man pages, thereby risking breakage if the Perl installation
were upgraded during a minor or major release.
2. I could write a simple tool to take all the embedded POD and
generate man pages into my private area.
I decided to write a simple tool, mostly because I'm opposed
to touching anything in the core distribution, since I have no idea
if someone at OpenBSD headquarters is likely to change things out
from under me.
And a simple tool it is, although it's about 80 lines of
Perl code. So, looking at a few lines at a time, here's what
I wrote, in about the order that I created the lines. To begin,
I started with my normal header:
#!/usr/bin/perl -w
use strict;
$|++;
With these three lines, I've turned on warnings, enabled the
common compiler restrictions (undeclared variables, soft references,
and barewords are all disabled), and turned off the buffering for
STDOUT.
Next, I put in a few configuration lines that I might change,
based on where I'm running the program:
## BEGIN configuration
my $MAN3DIR = "/home/merlyn/man/man3";
my $MAN3EXT = "3p";
## END configuration
Here I've defined a location below my home directory where I've
placed other personal manpages, and an extension for the specific
Perl module pages. Traditionally, Perl modules have the 3p
extension and are placed in section 3 of the UNIX manual. I've
added /home/merlyn/man to my MANPATH, so the man
command finds this directory just fine:
use Pod::Man;
use File::Find;
use Config;
Following that, I bring in the three modules (all in the Perl core
distribution) that I'll need to wander through the installed
directories and find the POD files. The Pod::Man module can
convert POD into manpages. The File::Find module recurses through
subdirectories. The Config module provides a hash interface
to the configuration parameters for the installed Perl. In fact, the
next two lines use that hash to locate two specific directories:
my $SITELIB = $Config{sitelib};
my $SITEARCH = $Config{sitearch};
The value for $SITELIB gives the path in which local Perl modules
are installed. $SITEARCH provides a similar path for architecture-specific
modules -- those which contain binary files resulting from compiling
C (or other languages). Generally, the $SITEARCH directory
will be within the $SITELIB directory, and this program presumes
that.
Next, I'll create a Pod::Man object configured for
the task:
my $podmanparser = Pod::Man->new(section => $MAN3EXT);
The section value gives the name appearing in the page header
banner, mostly cosmetic, but nice to get right.
Now comes the task of finding the existing POD documentation.
So, after a few tries, I came up with the following loop with File::Find:
my %pods;
find sub {
return unless /\.p(m|od)$/;
my $package = $File::Find::name;
for ($package) {
s{^\Q$SITEARCH/}{}
or s{^\Q$SITELIB/}{}
or die "Cannot remove $SITEARCH or $SITELIB from ", $File::Find::name\n";
s/\.p(m|od)$//
or die "What happened to the ext in $package?\n";
s{/}{::}g;
}
push @{$pods{$package}}, $File::Find::name;
}, $SITELIB;
There's a lot going on here, and it's best to work from
the outside in. The find subroutine has been imported from
File::Find and is presented with a subroutine reference (here,
an anonymous subroutine) and a starting path, $SITELIB. The
find routine starts at the top directory, recursing down, calling
the subroutine for each found entry (even ones in which we're
not interested). The line:
return unless /\.p(m|od)$/;
rejects the filenames that are neither Perl modules nor Perl POD files
by looking at $_, which contains the basename (no directory
part) of the file or directory being examined. The next few lines
extract the package name for the filename into $package. It
takes the full path from $File::Find::name, then removes either
the $SITEARCH or $SITELIB prefix from the path. If neither
of these succeeds, then something has gone terribly wrong, so it will
abort.
Next, these lines:
s/\.p(m|od)$//
or die "What happened to the ext in $package?\n";
s{/}{::}g;
turn the remainder of the name into a module name, by replacing the
slashes with double-colon package delimiters and stripping off the
extension. Finally, the loop adds this file name to an arrayref contained
within the %pods hash, indexed by the package name. Why a list?
Because many modules have a separate POD file, so we'll see both
Some/Module.pm and Some/Module.pod. We'll later
sort out which of these to use for the manpage, but we'll record
them all for now.
When this loop has completed, we have a hash %pods, keyed
by package name, with each entry comprising a list of one or more
files that may contain the documentation for that module.
When I showed this program to one of my friends, my friend commented
(only after I toiled over this part), "Why didn't you
just use Pod::Find?". Ah, yes. If I'd only known,
I could have reduced this part of the program to a few lines of
code. I'll have to file that away for use in a future program.
The lesson here is "always check the CPAN first, because any
interesting task is likely already written".
The next step is to wander through the hash and do whatever it
takes to update the manpages if needed. I'll start with a loop
like this:
POD: for my $pod (sort keys %pods) {
my @files = @{$pods{$pod}};
... more code here ...
}
I had to name the loop because we'll see a point later where
I want to execute a next against this loop even though I'm
in a nested loop. So, $pod contains a package name, and @files
contains one or more source files for that package. Next, we need
to figure out which one of many source files is needed if there's
more than one:
if (@files > 1) { # more than one? must sort
@files = sort {
## primary: prefer arch-specific over non-arch-specific
to_boolean($b =~ m{^\Q$SITEARCH}) <=>
to_boolean($a =~ m{^\Q$SITEARCH})
## secondary: prefer .pod to .pm
or to_boolean($b =~ /\.pod$/) <=> to_boolean($a =~ /\.pod$/);
} @files;
}
my $file = shift @files; # first one is always best now
Again, a lot of stuff going on here. If there's more than one
file, we'll sort it, preferring architecture-specific files over
generic files, and .pod files over .pm files. The first
entry in the list after sorting (or the only entry in the list if
there was only one to start with) is now the most likely candidate
for our manpage.
The to_boolean routine forces false to have 0 and true
to have 1, so we can sort nicely:
sub to_boolean {
$_[0] ? 1 : 0;
}
Next, we'll figure out the name of the manfile, and determine
whether we have any work to do:
my $manfile = "$MAN3DIR/$pod.$MAN3EXT";
next if
-e $manfile and
-M $manfile < -M $file; # skip if exists and newer
If the manpage file exists, and is newer than our source file, we've
got nothing to do, so we continue to the next entry.
At this point, we have a source file (either POD or Perl file),
which has not yet been updated into a manpage. However, the file
may still contain no POD directives. We need to look for some POD
in the file. The easiest way is to look for =head at the
beginning of a line. This isn't entirely accurate, but it's
the same rule that the perldoc command uses, so I figure
it's close enough. And that code came out like this (after
a few tries):
open IN, $file
or warn("Cannot open $file, skipping\n"), next POD;
while (<IN>) {
if (/^=head/) { # POD sign!
print "pod2man $file $manfile\n";
not -e $manfile or unlink $manfile
or warn("Cannot remove $manfile: $!\n");
open OUT, ">$manfile"
or warn("Cannot create $manfile: $!\n"), next POD;
seek IN, 0, 0;
$podmanparser->parse_from_filehandle(\*IN, \*OUT);
close OUT;
next POD;
}
}
The meat is in the middle: once we've determined we have a decent
POD file, we seek the file back to the beginning, and then call parse_from_filehandle
to generate the manpage.
So, any time I suspect that there's been a new module added
to my local install, I can run this program, and my local manpage
collection is updated, with minimal effort.
A simple task, simply executed by Perl, but handling an important
issue of letting me get at Perl's documentation with either
perldoc or man, working around a vendor limitation.
Most of those "gotta get it done now with no time to do it"
systems administration tasks seem to be about this large, and as
you can see, Perl fits the task nicely. So, until next time, enjoy!
Randal L. Schwartz is a two-decade veteran of the software
industry -- skilled in software design, system administration,
security, technical writing, and training. He has coauthored the
"must-have" standards: Programming Perl, Learning
Perl, Learning Perl for Win32 Systems, and Effective
Perl Programming. He's also a frequent contributor to the
Perl newsgroups, and has moderated comp.lang.perl.announce since
its inception. Since 1985, Randal has owned and operated Stonehenge
Consulting Services, Inc.
|