Swimming
with the Spammers
Phillip Tiburcio
For a long time, my various email accounts did their work quietly
and effectively; then I began receiving a large volume of spam every
day. I was being crushed under the weight of spam, so I fought back
with a personal email server built entirely with free tools and
reduced my spam intake by 95%. Although this article describes a
personal email setup, the tools mentioned can be applied to other
environments.
Bill of Materials
My plan was to build a server that would download all my email
accounts, scrub them for spam, sort them into appropriate mail folders,
then make everything available via imap/ssl or encrypted Webmail.
The integration of several free tools makes this possible on my
personal Linux server.
Spamassasin and Vipul's Razor scrub mail for spam using a distributed
spam-signature Internet hive mind. A free Perl script called Gotmail
can download messages from a Hotmail account via HTTP. Another script
called Fetchyahoo does the same for any Yahoo account. Fetchmail
rounds out the bunch by downloading mail from various POP accounts
I've collected through the years, and Squirrelmail provides a Webmail
gateway using Apache and PHP. A few procmail rules sort the incoming
mail into appropriate subfolders and relegate spam suspects to a
separate folder. An old Pentium III machine running Red Hat Linux
was pressed into service to act as the foundation for this host
of utilities. Any of the various BSD flavors or Solaris would have
worked as well.
Network Setup
A DSL connection provides the Internet connection to my home LAN
via a Linksys DSL Router. I configured the DSL router to allow inbound
SMTP, ssh, http, and https traffic to the Linux machine. Note that
some revisions of the Linksys firmware have terrible upload performance
unless you tune the Maximum Transmission Unit (MTU) size parameter
in its configuration. A size of 1492 worked well for me. I suggest
you not configure these inbound services until your server is installed,
tested, and patched for security purposes.
Domain Setup
I decided to go with my own domain name, registered via DomainDiscover.
ZoneEdit.com will host up to five domains on their DNS servers free
of charge. ZoneEdit also supports dynamic DNS updates via a Perl
script they provide. This is key if you do not have a static IP
address from your ISP. Note that ZoneEdit is not a domain registrar;
it merely provides DNS servers and a decent Web administration interface.
Once my domain was registered and properly auto-updating in ZoneEdit,
I created an MX record. Thus, I can receive email directly into
my own domain, while still downloading all my legacy accounts and
scrubbing everything for spam.
Email Setup
The overall message flow goes like this:
1. Email comes into various Internet accounts and is collected
via command-line utilities on a cron schedule.
2. Messages are forwarded to a local user account via SMTP.
3. Courier SMTP (http://www.courier-mta.org) picks up these
messages (as well as messages sent directly to my domain) and passes
them to Procmail for local delivery.
4. Procmail pipes messages through SpamAssassin, which modifies
messages that are detected as spam.
5. Modified messages are flagged by procmail and sorted into a
"spam" folder.
6. Clean email gets sorted into folders according to its original
destination.
7. Courier IMAP serves up the email for local LAN clients.
8. Squirrelmail running on Apache/PHP presents a Webmail interface
over ssl and accesses mail accounts via local IMAP.
I installed the Courier email system to serve SMTP and IMAP on
my server. There are several freely available alternatives for mail
servers (e.g., qmail and sendmail), but I selected Courier because
it comes with an integrated IMAP server and uses Maildirs for mail
storage instead of the traditional MBOX format. Courier also comes
with a Webmail component that is a bit ugly and short on features
compared to the SMTP and IMAP components. Each Courier subsystem
can be installed independently and each communicates via standard
protocols, so it was easy to substitute into a different Webmail
system.
Courier can be compiled from source, but this is not recommended.
Compiling and configuring Courier is a long, tedious process, and
the developers suggest you find a binary distribution and use that
instead. After going through the process myself, I definitely agree.
Once Courier is installed, set "procmail" as the local Mail Delivery
Agent (MDA). You can do this by editing the courierd configuration
file and setting:
DEFAULTDELIVERY="| /usr/lib/courier/bin/preline /usr/bin/procmail"
This will cause Courier to hand over all incoming email to procmail
for final delivery, a fact we will make extensive use of shortly.
Your paths may vary depending on how Courier was installed.
A Note on Courier Subfolders
You need to make sure that each user on your system has a valid
~/Maildir directory with the right structure. Courier comes with
a program called "maildirmake" that will create a proper Maildir
directory structure. On a Red Hat system, you can add a Maildir
in /etc/skel and all new users created with the adduser command
will be set up properly.
Note that if you want to use command-line tools to create Inbox
subfolders that appear via IMAP, you must create another Maildir
format directory structure in your ~/Maildir folder, named with
a leading ".". Thus, if you want an IMAP subfolder called "MailingLists",
you must use maildirmake to create ~/Maildir/.MailingLists/. To
create a sub-subfolder (e.g., a "BugTraq" mail folder that is a
subfolder of "MailingLists"), you must prepend the name of the parent
folder, chained with a ".". Therefore, to make a BugTraq subfolder
of MailingLists, you must use maildirmake to create ~/Maildir/.MailingLists.BugTraq/.
Do not make actual subdirectory structures.
Mail Collection
Once I verified that I was receiving email at my new domain and
that IMAP was working on the local LAN, I installed gotmail into
/usr/local/bin. Gotmail (http://ssl.usu.edu/paul/gotmail/)
is a single file Perl script that accesses Hotmail via http (using
curl, another command-line utility), reads each mail message, and
forwards that message to another address that you specify. All configurations
for gotmail are stored in ~/.gotmailrc. Take care that file permissions
are restrictive enough on .gotmailrc, since it must contain your
Hotmail username and password. I configured gotmail to forward incoming
Hotmail to my local account on the server. However, gotmail has
several useful options to archive all your Hotmail into an MBOX.
Fetchyahoo (http://fetchyahoo.twizzler.org/) operates on
a similar concept but is geared to download Yahoo mail via http.
Fetchmail is another venerable UNIX command-line utility that fetches
mail via POP. I use fetchmail to download mail from my mac.com account.
Each of these utilities can be configured on a per-user basis, and
can be set to run via cron at regular intervals. All of them forward
incoming messages to the local UNIX account via SMTP. Hotmail tends
to change its Web page layout every few months, so be sure to stay
informed of any updates to Gotmail. If you subscribe to these services'
POP offerings, you can simply use fetchmail for everything and avoid
any incompatibilities.
Spam Filtering
At this point, I had mail from many different sources being collected
on a regular basis into my account on the Linux machine. All told,
I was receiving more than 90 spam messages per day. This is where
SpamAssassin (http://spamassassin.taint.org) came in to save
the day. SpamAssassin is a Perl script that analyzes mail according
to many rules. Each rule "hit" has a score value associated with
it. If the overall score for a particular message goes over a configurable
threshold, the email is tagged as spam. This system works incredibly
well. In my experience, the default configuration is 99% accurate
with very few false positives. Rules match on text commonly found
in commercial email, whether or not the source domain is in the
Realtime Blackhole List (RBL), and many other items.
Each tagged message gets modified in two ways: the subject line
gets the string "*****SPAM*****" prepended to it, and the body of
the message is prepended with a report indicating which rules were
triggered. For example, here's the epitaph on a recently "assassinated"
message:
Subject: *****SPAM***** Automated Credit Repair that Works! Free Sign Up!
SPAM: ------------------- Start SpamAssassin results --------------------
SPAM: This mail is probably spam. The original message has been altered
SPAM: so you can recognise or block similar unwanted mail in future, using
SPAM: the built-in mail filtering support in your mail reader.
SPAM:
SPAM: Content analysis details: (10.4 hits, 5 required)
SPAM: Hit! (0.8 points) Subject has an exclamation mark
SPAM: Hit! (0.8 points) BODY: /^[^<]{199,}$/m
SPAM: Hit! (2.5 points) BODY: Link to a URL containing "remove"
SPAM: Hit! (3.3 points) BODY: /click here.{0,100}<\/a>/is
SPAM: Hit! (3 points) Listed in Razor, see http://razor.sourceforge.net/
SPAM:
SPAM: ------------------- End of SpamAssassin results -------------------
Vipul's Razor (http://razor.sourceforge.net) is a set of Perl
scripts that can scan mail as a plug-in to SpamAssassin. Vipul's Razor
creates a fingerprint of each incoming message and checks it against
an Internet database of spam message fingerprints. Users can report
spam mail that is not yet in the database to help other users identify
similar mail in the future. By this collective effort, widely distributed
commercial email can be detected. Although this mechanism is not foolproof,
it can increase SpamAssassin's accuracy when used as a plug-in. Details
on how to configure both tools are available at their respective Web
sites.
Both SpamAssassin and Vipul's Razor require several supporting
Perl modules. The best way to install these is to run:
perl -MCPAN -e shell
to start an interactive installation shell prompt. Then type:
install <perl module>
The CPAN shell will locate the desired Perl module on the Internet,
then download and install it along with any dependencies it requires.
To have SpamAssassin start tagging mail, you need to pipe all
your mail through it using Procmail. If you are scrubbing a lot
of mail, instantiating Perl for each incoming message could degrade
performance. Luckily, SpamAssassin also features a "daemon" mode,
where you load up a single instance of Perl/SpamAssassin in server
mode, then use a small stub program to submit text for evaluation.
This can drastically affect performance for busy servers. There
are many configuration options available; however, my experience
is that SpamAssassin works nearly flawlessly "out of the box."
Mail Sorting with Procmail
To activate SpamAssassin for all users on your system, put the
following block in your /etc/procmailrc file:
PATH=/bin:/usr/bin:/usr/local/bin
SHELL=/bin/sh
VERBOSE=on
LOGFILE=$HOME/proc.'date +%y-%m-%d'
LOGABSTRACT=all
INCLUDERC=$HOME/.procmail-preprocess
:0fw:spam.lock
| spamassassin -P
:0e
{
EXITCODE=$?
}
:0:
* ^Subject:.*\*\*\*\*SPAM\*\*\*\*
$HOME/Maildir/.spam/
INCLUDERC=$HOME/.procmail
:0
*
$HOME/Maildir/
Keep in mind that the ~/Maildir/.spam/ directory must already exist
and be in proper Maildir format.
Messages sent to certain mailing lists tend to trigger false positives
in SpamAssassin due to the envelope structure, so certain exceptions
must be made. Procmail reads /etc/procmailrc upon startup for global
mail-handling rules. Typically the /etc/procmailrc references a
.procmail in the user's home directory for custom rules. I changed
the procmailrc flow to run a ~/.procmail-preprocess file in each
user's home directory (the first INCLUDERC line), followed by the
global SpamAssassin block, then to execute ~/.procmailrc (the last
INCLUDERC line). Rules that tightly match certain mailing lists
to which I belong are put in ~/.procmail-preprocess so that SpamAssassin
does not see them. Procmail is a powerful mail-handling tool, and
a complete discussion of how to configure it is beyond the scope
of this article. However, if you like to learn by example, take
a look at these excerpts from my configuration. The following block
in my .procmail-preprocess will sort any mail that has the string
"LISTSERV@beethoven.us.\checkpoint.com" anywhere in the header,
and pipe it into my Inbox/Lists/Security mail folder:
:0 H
* .*LISTSERV@beethoven\.us\.checkpoint\.com.*
$HOME/Maildir/.Lists.Security/
Rules that sort incoming scrubbed mail into per-original-account folders
get put in the ~/.procmailrc. This block in my ~/.procmail will stuff
any mail sent to my mac.com account (and subsequently fetched by fetchmail)
into its own subfolder:
:0:
* ^TO_.*tiburcio@mac\.com
$HOME/Maildir/.Accounts.Tiburcio@mac/
Any email from the local cron process is automatically trashed by
this code:
:0:
* ^From:.*Cron
/dev/null
Don't forget the trailing slash when indicating a Maildir-style destination
folder, otherwise procmail will assume you want MBOX output.
Beware that certain e-card messages will also get sorted as spam.
I had to make an exception in my .procmail-preprocess for users
who like to send me such mail. Refreshingly, SpamAssassin skillfully
catches chain letters and joke messages sent from people that I
do know. Procmail could be configured to immediately delete detected
spam, but I prefer to sort it all into a separate folder for manual
deletion later. Sometimes clean email messages (such as email receipts
for Web purchases) get tagged, so it's worthwhile to scan the subject
lines in the spam folder before emptying it.
Web Sites and Webmail
After the mail and spam system was debugged and working, the next
step was to serve up the mailboxes via Webmail. Squirrelmail (http://www.squirrelmail.org)
is an excellent, free collection of PHP scripts that get the job
done. I created a virtual host in Apache just for Webmail (i.e.,
mail.<domain>.com). Squirrelmail requires very little by way
of configuration, and comes with a small terminal-based control
panel. At minimum all you need to do is enter the IP address of
your IMAP instance into the control panel. Color and font schemes
are selectable on a per-user basis from the Web interface. If you
are handy with HTML and CSS, you can create your own "skins." The
Web pages it generates are largely free of JavaScript and images,
and download quickly.
I installed mod_ssl and created a homebrew SSL certificate per
instructions at http://www.modssl.org. This allowed me to
access my Webmail domain via https for greater security. The locally
generated certificate means that the first time a Web browser encounters
your site a pop-up message will appear, warning that the certificate
cannot be authenticated. Clicking "OK" to accept the certificate
will allow you into the site. You can alternatively purchase a real
certificate and avoid this error message entirely.
Remote Administration and Access Remote adminstration is accomplished
via ssh. While my DSL provider occasionally changes my IP address,
I do have a static hostname courtesy of ZoneEdit. Squirrelmail is
nice, however I like to use a full-featured mail client whenever
I can. To this end, I set up an ssh tunnel to access the SMTP and
IMAP ports on my server remotely. Since I run Mac OS X on my laptop,
I can use the following SSH command:
ssh -C -L 25:192.168.1.3:25 -L 143:192.168.1.3:143 <domain.com>
192.168.1.3 is the local Ethernet IP address of my Linux server, whereas
<domain.com> resolves to its public IP address. The "-C" option
indicates compression should be used, which results in a substantial
speed improvement on my 128K DSL uplink. The "-L" sections indicate
that the ssh client program running on my laptop should start listening
on port 143 and send any data received over the ssh link to the server
for delivery to port 143 on 192.168.1.3 (the server itself). My laptop's
mail client is then directed to connect to 127.0.0.1 for IMAP. The
second "-L" command does the same for port 25 (outgoing SMTP). This
gives me an encrypted session for all email traffic, and I don't have
to open up IMAP ports on my Linksys. Outgoing email is also relayed
automatically, because from the server's perspective it appears to
be a local connection.
This mini-vpn works wherever ssh traffic is allowed and lets me
use a more robust email client with offline capabilities from my
laptop. Webmail is still useful for those situations where I am
at a guest computer or where outgoing ssh traffic is denied by a
firewall.
Conclusion
All told, this setup took about a weekend to complete. Most of
that time, however, was spent installing Red Hat and compiling Courier
from scratch (a wretched task if there ever was one). You can probably
save some time by using whatever SMTP and IMAP server solutions
come with your distribution, provided you can insert Procmail into
the local delivery chain. The reduction in spam was well worth my
efforts, however, and the bonus of having all my mail aggregated
and available in a multitude of formats has been very useful.
Phillip Tiburcio is an independent consultant in Chicago. He
graduated from Rensselaer Polytechnic Institute in 1995 with a degree
in Computer Systems Engineering. Phillip has been in the systems
administration/integration field for the past eight years, working
mainly with Solaris, Windows NT 4.0 (with Terminal Server Edition/Citrix
Metaframe), and Windows 2000. He has been working with Linux for
about eight years, starting with Slackware, and currently with Red
Hat. Phillip can be contacted at: phillip@tiburcio.info.
|