Load
Balancing Email
Frank Wiles
At some point, you will discover that your single server email
system can no longer keep up with the ever-increasing torrent of
spam and legitimate email. When this happens, you'll probably
research commercial email server software, expensive shared storage
like Fibre channel, email spools on NFS, and load-balancing hardware.
You may not realize that you can provide large-scale email on multiple
servers using readily available open source software that is familiar
to most Unix and Linux systems administrators.
Like many admins, I used to buy the biggest new server each year
to keep up with more users, more spam, and more email in general.
I soon found that my needs were growing faster than the new hardware
was becoming available. I needed to find a solution that addressed
all of the following criteria:
- Spread the load across multiple, less-expensive servers
- Avoided expensive (Fiber channel) or error-prone (NFS) shared
storage of email spools
- Allowed us to grow as little or as much as we needed each year,
without having to replace large parts of the system
- Reduced or eliminated single points of failure
- Did not increase the time on technical support calls
- Centralized spam filtering
- Centralized system logs
- Minimized any specialized knowledge a sys admin would have
to learn to maintain the system (beyond normal systems administration
knowledge of sendmail, DNS, POP3, and IMAP)
The solution my company chose was to load balance our user base
by assigning a user to a particular mail storage server and having
our primary incoming email servers (a.k.a., primary MX) redirect
all email for that user to the correct server. Rather than configuring
that redirection directly through a sendmail rule, a milter, or
a direct alias to the mail storage server, we used a generic alias
at the MX server and specified the storage server address through
a DNS CNAME record.
This solution let us implement load balancing at the MX servers
with minimal mess. It addressed all our design criteria without
the need for expensive software or hardware. Our solution also offered
a clean and simple way to move accounts between mail storage servers,
which provided a convenient method for tuning email system performance.
If you have ever administered a large email system, you are aware
that it is not the actual receiving and storing of the email that
is the most expensive operation; it is the repeated (sometimes constant)
POP3 and IMAP requests from the users' email client that is
most taxing to the server.
Some users check for new messages every 30 seconds, while others
only check once per day. Being able to move the more frequent users
to the faster hardware allowed us to tailor our load balancing to
fit the user base and the hardware, something that a round-robin
setup would not have allowed.
Here is an example from the point of view of the sending email
server. If the sending server wants to send email to john@domain.com,
it first does a DNS lookup for the MX records for that domain. In
our setup, we use two primary MX servers at a priority of 10 --
mx1.domain.com and mx2.domain.com, respectively. This configuration
effectively gives us the load-balancing properties of round robin,
as well as the hot failover of a heartbeat.
Assume the sending server chooses to use mx2.domain.com for this
transaction. On mx2.domain.com, there would exist a sendmail alias
of:
john:john@john.pop.domain.com
This alias would redirect all email for john@domain.com to john@john.pop.domain.com.
Assuming John is assigned to our first mail storage server (ms1.domain.com),
john.pop.domain.com would simply be a DNS CNAME entry pointing to
ms1.domain.com.
This solution is very simple from the user's perspective.
When John puts john.pop.domain.com in as his incoming email server,
he does a quick DNS lookup and his email client finds ms1.domain.com
as the server with which he is supposed to be talking. Obviously,
for this to work, POP3 and/or IMAP need to be running on the mail
storage servers.
This setup gave us many benefits over other systems in that we
could move a user to any mail storage server at any time without
having to inform the user. The user's email software could
continue to check for new messages at username.pop.domain.com and,
behind the scenes, we could move the user to a different server
simply by setting up the necessary Unix accounts and changing the
DNS record to point to the new location.
One setup we designed and ultimately rejected required users to
point their email software to a particular mail storage server (for
example, pop47.domain.com). This approach was not only confusing
to users, but also increased the work for the technical support
staff. The support staff had to verify that pop47 was indeed the
correct server for this user. This was further complicated for people
with multiple email accounts, each of which might live on a different
server. Furthermore, it made moving users between servers and posting
generalized instructions online nearly impossible. Our system elegantly
avoided all of these problems.
With our system, we were able to reduce our single points of failure
by having multiple primary MX servers and DNS servers for the pop.domain.com.
In the event a mail storage server crashed or was overwhelmed with
email, incoming messages would simply queue up at the primary MX
servers for later delivery. We added further protection by having
a secondary MX server set up in the unlikely event that all primary
MX servers were unavailable.
By spreading our users across multiple autonomous servers, we
further reduced the risk of having all of our users lose their email
service at once. If we experienced a hardware failure on a mail
storage server, the failure would affect the users whose spools
resided on that server. We also compartmentalized the servers with
respect to security. If a hacker were able to gain entry to one
server, there was a good chance he would not be able to gain entry
to the other servers.
We could also take advantage of sendmail's milter extensions
to centralize spam filtering at the mx servers. By reducing the
amount of spam messages the mail storage servers received, we further
reduced our hardware requirements.
The only downside we found to this system was that an admin could
no longer quickly ssh onto the mail server, run a simple
add user command, and be done. We overcame this by writing an in-house
system tied to our overall customer management application that
issued commands to each server with the proper information --
aliases on the MX servers, DNS entries for the DNS servers, and
normal Unix users for the mail storage system. We have been running
with this system for several months now and for the most part it
has behaved beyond our original expectations.
Frank Wiles is the Manager of Systems Administration and programming
for Sunflower Broadband (http://www.sunflower.com)
in Lawrence, Kansas. He also does systems administration and programming
consulting via his company Revolution Systems, LLC. Frank can be
reached at: frank@revsys.com.
|