Speeding
up Your Perl Programs
Randal L. Schwartz
How fast does your Perl run? I hope that, most of the time, the
answer is "fast enough". But sometimes it may not be.
How do we speed up a Perl program? How do we know what's making
it unacceptably slow?
The first rule of speed optimization is "don't".
Certainly, don't write things that waste time, but first aim
for clarity and maintainability. With today's processor speeds
far exceeding the "supercomputers" of yesterday, sometimes
that's "fast enough".
But how do we know whether something is "fast enough"?
The first easy cut is just to check total run time. That's
usually as simple as putting "time" in front of your program
invocation, and looking at the result (which may vary depending
on your operating system and choice of shell). For example, here's
a run of one of my frequently invoked programs, preceding the command
(get.mini) with time:
[localhost:~] merlyn% time get.mini
authors/01mailrc.txt.gz ... up to date
modules/02packages.details.txt.gz ... up to date
modules/03modlist.data.gz ... up to date
67.540u 3.290s 1:40.62 70.3% 0+0k 1585+566io 0pf+0w
So, this tells me that this program took about 70 CPU seconds over
a 100-second timespan, essentially doing nothing useful for me except
verifying that my local mirror of the CPAN is up to date. (Generally,
to get sensible numbers, I have to do this on an otherwise idle system.)
Hmm. I wonder where it's actually spending all of its time?
I can get the next level of information by invoking the built-in
Perl profiler on the script. Simply include -d:DProf as a
Perl command switch, and Perl will automatically record where the
program is spending its time in a file called tmon.out in
the current directory:
[localhost:~] merlyn% perl -d:DProf 'which get.mini'
authors/01mailrc.txt.gz ... up to date
modules/02packages.details.txt.gz ... up to date
modules/03modlist.data.gz ... up to date
[localhost:~] merlyn% ls -l tmon.out
-rw-r--r-- 1 merlyn staff 44991232 Sep 8 08:46 tmon.out
[localhost:~] merlyn%
(I have to use which here because perl won't look
in my shell's search path for the script.) The execution under
the profiler slows the program down a bit, but otherwise doesn't
affect any functionality. I now need to summarize the raw data of
tmon.out with the dprofpp (also included with Perl):
[localhost:~] merlyn% dprofpp
Total Elapsed Time = 99.50810 Seconds
User+System Time = 70.79810 Seconds
Exclusive Times
%Time ExclSec CumulS #Calls sec/call Csec/c Name
14.9 10.60 12.739 545807 0.0000 0.0000 URI::_generic::authority
14.8 10.54 12.002 461833 0.0000 0.0000 URI::_generic::path
13.1 9.296 19.853 293900 0.0000 0.0001 URI::new
12.8 9.108 39.082 83971 0.0001 0.0005 URI::_generic::abs
9.82 6.953 12.677 461842 0.0000 0.0000 URI::_scheme
8.16 5.778 6.024 293900 0.0000 0.0000 URI::_init
7.58 5.368 91.054 41987 0.0001 0.0022 main::my_mirror
6.39 4.522 4.521 293900 0.0000 0.0000 URI::implementor
5.78 4.093 32.267 41984 0.0001 0.0008 URI::_generic::rel
5.07 3.588 3.588 251910 0.0000 0.0000 URI::_generic::_check_path
4.26 3.016 3.016 83976 0.0000 0.0000 File::Spec::Unix::canonpath
3.92 2.775 15.390 83968 0.0000 0.0002 URI::http::canonical
3.48 2.465 14.186 377874 0.0000 0.0000 URI::scheme
3.09 2.185 9.973 83968 0.0000 0.0001 URI::_server::canonical
2.25 1.596 3.419 41988 0.0000 0.0001 File::Spec::Unix::catdir
[localhost:~] merlyn%
Now, this is some interesting information that I can use. It took
70 CPU seconds. This is just like the previous run, so the profiling
didn't significantly alter the execution. The first routine listed,
URI::_generic::authority, was called half a million times,
contributing about 10 CPU seconds to the total. (There are other switches
to dprofpp, but the default options seem to be the most useful
most of the time.)
If we could make URI::_generic::authority twice as fast,
we'd save about 5 seconds on the run. Even making URI::new
run twice as fast would save us 5 seconds on the run as well, even
though it's called half as frequently. Generally, the important
thing to note here is total CPU for a routine, not the number of
times it is invoked, or the CPU time spent on an individual invocation.
However, I do find it interesting that I had to create 293,000 URI
objects -- perhaps there's some redesign of my algorithm
to avoid that. Good algorithm design is important, and is both a
science and an art.
The other thing to note is that most of the routines that are
burning CPU are in libraries, not in my code. So, without changing
the code of a library, I need to call library routines more efficiently
or less frequently if I want to speed this up.
We have the bigger picture to consider. I invoke this program
maybe a dozen times a day. If I worked hard to reduce this program
to 35 CPU seconds instead of 70 CPU seconds, I'll save about
5 CPU minutes a day. How long would I have to work to optimize it
to that level? And how many times would I have to invoke the program
before the money I spent on the (usually otherwise idle) CPU being
saved is equal to my billing rate for the time I spent?
This is why we usually don't worry about speed. CPUs are
cheap. Programmer time is expensive.
But let's push it the other way for a minute. Suppose this
program had to be invoked once a minute continuously. At 70 CPU
seconds per run, we've just run out of processor (not to mention
that the realtime speed was even longer). So we're forced to
optimize the program, or buy a bigger machine, or run the program
on multiple machines (if possible).
Armed with the data from the profiling run, we might want to tackle
the URI::new routine. But rather than tweaking the code a
bit, and re-running the program overall, it's generally more
effective to benchmark various versions of the code for given inputs,
in isolation from the rest of the program.
For example, suppose we wanted to speed up the part of the routine
that determines the URI's scheme separate from the rest of
the URL. Three possible strategies come to mind initially: a regular
expression match, a split, or an index/substr,
implemented roughly as follows:
sub re_match {
my $str = "http://www.stonehenge.com/perltraining/";
my ($scheme, $rest) = $str =~ /(.*?):(.*)/;
}
sub split_it {
my $str = "http://www.stonehenge.com/perltraining/";
my ($scheme, $rest) = split /:/, $str, 2;
}
sub index_substr {
my $str = "http://www.stonehenge.com/perltraining/";
my $pos = index($str, ":");
my $scheme = substr($str, 0, $pos-1);
my $rest = substr($str, $pos+1);
}
We now have three candidates. Which one is faster? To determine that,
we can use the built-in Benchmark module, as follows:
use Benchmark qw(cmpthese);
my $URI = "http://www.stonehenge.com/perltraining/";
sub re_match {
my $str = $URI;
my ($scheme, $rest) = $str =~ /(.*?):(.*)/;
}
sub split_it {
my $str = $URI;
my ($scheme, $rest) = split /:/, $str, 2;
}
sub index_substr {
my $str = $URI;
my $pos = index($str, ":");
my $scheme = substr($str, 0, $pos-1);
my $rest = substr($str, $pos+1);
}
cmpthese(-1, {
re_match => \&re_match,
split_it => \&split_it,
index_substr => \&index_substr,
});
When run, this results in:
Rate re_match split_it index_substr
re_match 131022/s -- -37% -46%
split_it 208777/s 59% -- -14%
index_substr 242811/s 85% 16% --
The "-1" on the cmpthese call says "run each
of these for roughly 1 CPU second". Note that index_substr
seems to be the clear winner here, even though there are more things
for me to type and get correct.
One thing to watch out for in benchmarking is optimizing for atypical
data. For example, the URI I picked is probably typical, but what
happens if the URL is longer?
my $URI = "http://www.stonehenge.com/merlyn/" .
"Pictures/Trips/2003/03-06-PerlWhirlMacMania/" .
"Day-0-Pearl-Harbor/?show=14";
Now the results show that index_substr is even better!
Rate re_match split_it index_substr
re_match 117108/s -- -41% -49%
split_it 199110/s 70% -- -13%
index_substr 229682/s 96% 15% --
Given those two extremes, I'd say that index_substr is
probably the right tool for this particular task, all other things
being equal.
Armed with Perl's built-in profiler and the Benchmark
module, I can usually fine-tune my routines and get the speed I
want out of my Perl programs. If that's not enough, I could
take those frequently called expensive routines and code them in
a lower-level language instead. But usually that won't be needed.
Until next time, enjoy!
Randal L. Schwartz is a two-decade veteran of the software
industry -- skilled in software design, system administration,
security, technical writing, and training. He has coauthored the
"must-have" standards: Programming Perl, Learning
Perl, Learning Perl for Win32 Systems, and Effective
Perl Programming. He's also a frequent contributor to the
Perl newsgroups, and has moderated comp.lang.perl.announce since
its inception. Since 1985, Randal has owned and operated Stonehenge
Consulting Services, Inc.
|