An Apache
Load Balancing Cluster
Don Gourley
A cluster is a group of computers that have been interconnected
to share processing. Clustering technology, which coordinates the
computers as they work in parallel on computing tasks, comes in
a variety of flavors. Clusters can be built from custom hardware
and software that implement a fast interconnect and some kind of
shared memory or cache. Clustering technology can also be completely
implemented in software, making use of the power and low prices
of today's PCs, inexpensive high-speed networking like Fast
Ethernet, and a robust open operating system like Linux. An example
of this kind of cluster is the Linux Beowulf system, which consists
of a server node connected to multiple client nodes via some type
of private high-speed network.
In this article, I describe the Java Application Server Pseudo-cluster
(JASPer), a simple cluster built with commodity hardware and free
software from the Apache Foundation. The master server runs the
Apache HTTP daemon and the cluster nodes are running the Apache
JServ Java application server. Using this combination of open software
and inexpensive equipment, my team has been able to reap the benefits
of clustering while avoiding some of the disadvantages.
Advantages and Disadvantages of Clusters
There are three primary benefits of clustering, depending on the
computing tasks to be shared by a cluster. First is the ability
to scale capacity. Additional computers can be easily added to a
properly constructed cluster to gain additional capacity for processing.
This works only until any shared resources in the cluster (the master
node and cluster interconnect) begin to become saturated. Second
is fault tolerance. Assuming that the cluster is smart enough to
stop trying to use any nodes that are no longer available, it will
continue to run as long as the master node and at least one client
node can communicate. Finally, higher performance is often achieved
with a cluster by distributing work across multiple servers, although
the shared resources in a cluster will again place an upper limit
on how far this benefit can extend.
There are, however, some problems with clusters, which have limited
their use. The difficulty of setting up and administering a cluster
has prevented some from taking advantage of them. IT shops without
the appropriate expertise find the administration of a cluster very
complex and, depending on the type of clustering technology used,
support services can be expensive or unavailable.
Another limitation is the kind of applications that can benefit
from a clustered platform. To take advantage of the performance
of a cluster, computing tasks must be partitioned in some way to
distribute parts of the tasks across multiple processors. In parallel
computing, this partitioning and distribution is usually performed
in the application program, inserted by intelligent optimizing compilers
or based on directives included by the programmer. For example,
on a Beowulf system, the programmer decides which parts of the program
can be run concurrently on different client nodes and uses a messaging
software API or remote shell commands to run those procedures on
client nodes.
Load Balancing Clusters
However, certain computing tasks are inherently concurrent and
can be partitioned on a cluster without special programming. A high-volume
transaction processing system, in which multiple transaction requests
are submitted concurrently, can be naturally partitioned on the
transactions. A common example of such a system is a Web server,
and a simple and popular cluster technology is Web redirection,
where a central switch (the master node) redirects HTTP requests
to one of a set of Web servers based on server load or geographic
proximity to the client (or both).
Load balancing software can be used in a similar way to create
effective clusters for Web applications. My team has done this with
Apache JServ, which provides load-balancing across a set of Java
servlet engines. Methods similar to those described here can be
used to cluster any Web application server that supports load balancing.
This technique can minimize the primary disadvantages seen with
other kinds of clusters. By using your existing application server
software, administration of the cluster is only incrementally more
complex than administering a single Web server. And, the transaction-oriented
nature of Web applications means that no special programming is
required to partition the computing tasks.
On the other hand, a significant limitation of this kind of cluster
is that it only runs applications written for the application server.
It is not a general purpose cluster and that is one reason we named
our implementation a "pseudo-cluster" (although the main
reason was that it sounded better than "Cluster" in the
acronym JASPer).
In "Building a Web-Based Java Application Server with Apache JServ"
(Sys Admin, February 2000) my coauthor and I described how
to install, configure, and administer JServ as part of a distributed
application architecture. As described in that article, load balancing
is performed by the Apache HTTP server module mod_jserv.
When an initial request is received by mod_jserv on the master
node, it randomly selects a JServ cluster node from the defined
set of servers. The request is passed to that cluster node using
the Apache JServ Protocol (AJP), a network protocol for communicating
between the master and cluster nodes. mod_jserv adds a server
id to the JServ session id so that subsequent requests for that
session are recognized and routed to the same cluster node.
A weight can be added to each defined JServ server to have more
powerful cluster nodes process more of the requests. If a JServ
server goes down (i.e., fails to respond to mod_jserv), then
its requests will be routed to another server. In that case, a new
session needs to be started unless your application has implemented
some kind of persistent session (e.g., by saving session information
to a database accessible by all JServ servers). The previous Sys
Admin article describes the routing of requests in more detail.
There is also more information about JServ configuration and its
load balancing features on the Java Apache Project Web site (http://java.apache.org/).
In practice, we have found the JServ load balancing mechanism
to be quite effective. The load has been very closely balanced as
measured by the number of sessions on each server. During moderately
busy times, each JServ node in our cluster handles between 65 and
75 sessions. Actual load may not always correspond to the number
of sessions being handled, but in our portal application, it is
close because there is not a wide discrepancy in the processing
required to handle different kinds of requests.
JASPer Configuration
The three parts of a load balancing cluster comprise the master
node, which performs the load balancing; the cluster nodes, which
process the transactions; and the cluster interconnect, which allows
the master and cluster nodes to communicate. The components used
for JASPer are illustrated in Figure 1.
The master node can be the weak link in a cluster since all transactions
must go through it. A master node failure will bring down the whole
cluster, so the platform must be very robust. To enhance availability
and performance, a Web redirecting switch could be used to route
HTTP requests to redundant master nodes, which are configured to
use the same set of cluster nodes in a many-to-many cluster configuration.
However, there is still a potential single point of failure. It
has just been moved from the master node to the (presumably more
robust) Web redirecting switch.
For our cluster, we used a Sun Enterprise 4000 with six 250-Mhz
processors and 3.5-GB RAM as the master node. This is more processing
power than needed to receive HTTP requests and route them to the
cluster nodes, but this server hosts several Web sites and an online
library catalog application besides being the JASPer master node.
This platform was selected for its reliability more than its performance.
Cluster nodes can be any combination of servers that run Apache
JServ, including Linux, Solaris, and Windows NT platforms. If a
combination of different hardware or software platforms is used,
you should make some guesses about their relative performance so
you can weight them in the load balancing set defined for mod_jserv.
Benchmarking of the individual nodes, as described below, can be
used to make better estimates of their relative capacity. For example,
if you have one new high-speed node that can handle three times
as many requests per second as an old slower node, you might ask
mod_jserv to route three quarters of the requests to the
faster node with these directives:
ApJServBalance setname FASTNODE 3 ApJServBalance setname SLOWNODE 1
Since JASPer is supporting our primary enterprise Web portal, we
were able to get funding to buy four new Pentium III 600 Mhz Linux
servers to serve as the JServ cluster nodes. Memory requirements
for the cluster nodes depend on the Java applications that run on
them. In our case, we use lightweight sessions, saving the minimum
information about each session that we need, but load a lot of configuration
information in each servlet at startup to avoid having to read the
configuration database and files for each request. 256 MB RAM appears
to be sufficient to avoid swapping. Memory requirements of the Java
Virtual Machine can be determined at runtime by a Java monitoring
agent class that is presented below.
We also purchased a dedicated 100-Mbs network switch with a Gigabit
Ethernet upload module to connect to the master node. We initially
purchased more bandwidth than necessary to make sure we could scale
the cluster as usage increased. Like the master node, the cluster
interconnect is a shared resource and potential single point of
failure. For performance and fault tolerance, redundant interconnect
networks could be implemented. But in our experience, network switches
are more reliable than servers, so this redundancy is usually not
necessary. Although virtually all the interconnect traffic is between
the master node and a cluster node, this switch is also connected
to external networks via a router to support occasional connections
required for services such as DNS lookup and remote administration
with telnet. Alternatively, we could have had our master
node act as a router for the cluster's external network traffic.
The total cost for the four cluster nodes, network switch, and
Gigabit Ethernet adapters for JASPer was about $10,000. We can scale
up to 10 cluster nodes for just the cost of additional nodes before
reaching the capacity of our cluster interconnect. The cost of the
cluster could have been reduced significantly if we had used our
existing LAN for the interconnect and recycled old PCs as the cluster
nodes. However, we think the cost is justified when you factor in
the increased reliability of the new equipment along with the performance.
JASPer Administration Web Page
Maintaining a cluster complicates systems administration. Besides
having more servers to take care of, the administrator must combine
information from multiple sources to get an overall picture of the
status of a cluster. Fortunately, the Apache servers provide some
status information through dynamic Web pages. Additional information
about the state of a servlet engine can be obtained from a monitor
servlet. We have collected this information in a multi-frame Web
page to provide a quick picture of our cluster performance and status
(see Figure 2).
The top frame contains the Apache HTTP server status page. This
page lists some vital statistics for the HTTP server, such as uptime,
total traffic and throughput, and current requests. It also lists
each child server process and its most recent request processed.
To enable this page, turn on ExtendedStatus and define the
server-status handler in the httpd.conf file:
ExtendedStatus On <Location /server-status> SetHandler server-status Order deny,allow Deny from all Allow from 127.0.0.1 </Location>
The server status page is then available at: http://localhost/server-status.
The example directives only grant server status access to the server
host itself. As with all the status pages described here, you should
restrict access to the status page to prevent untrusted users from
accessing information that should remain secret. In our environment,
we allow access to a small LAN protected by a firewall that we can
dial into for remote administration and troubleshooting.
JServ provides a set of status pages displayed on the right side
of our frameset status page. The top-level page is created by mod_jserv
and includes links to the master node and each cluster node. Clicking
on the master node under "Configured Hosts" displays the
mod_jserv parameters and each servlet zone that has been
mounted. If a servlet zone is defined as part of a load balancing
set, then each cluster node in the set is listed. A "test"
button is provided to run the servlet engine on that node (JServ
is actually a servlet and can execute itself). This provides access
to the configuration parameters for each individual JServ server.
You enable the JServ status pages in the httpd configuration,
just as we did for the HTTP server status handler above (in this
example, we again grant access only to the HTTP server host):
<Location /jserv/> SetHandler jserv-status Order deny,allow Deny from all Allow from 127.0.0.1
</Location>
To let each JServ execute itself to display its configuration parameters,
you must also specify "security.selfservlet=true"
in the jserv.properties files.
On the left side of the frameset, we display a graph of the Web
server traffic over the past 30 hours. This graph is generated every
15 minutes by MRTG (http://ee-staff.ethz.ch/~oetiker/ \
webtools/mrtg/mrtg.html). MRTG was designed as a tool for graphing
traffic on network equipment, but can easily be extended to graph
one or two values representing any kind of resource usage by providing
an external program in the Target directive. When run, the external
program should return four lines containing, in order, the first
value to graph, the second value, a string specifying the uptime
of the server, and a string telling the name of the server.
Our external program for JASPer status is a simple script that
counts occurrences of GET and POST requests in the
HTTP server access log. One graph line plots total requests, and
the other plots portal requests for access to remote resources.
MRTG expects, by default, continuously incrementing counters, so
our script can count all the requests without worrying about how
many it counted last time. Still, it is not a very efficient process,
so we increased the interval from the default 5 minutes to 15 minutes.
We also roll the access log every night, so no more than a day's
worth of requests are counted each time the script is run.
The bottom frames of our combined status page contain brief information
that is extracted by a special servlet, MonitorAgent, running on
each cluster node. If a cluster node is down, then the corresponding
frame will not be displayed -- providing a quick way to see
whether a server has died. Java source for the MonitorAgent class
is shown in Listing 1.
The MonitorAgent information includes the version number of JServ
and the servlets to help ensure that each cluster node's configuration
is identical to the others. The version of each servlet is available
because our local coding standards specify that every servlet return
its version number from the getServletInfo() method. The
names of each servlet must be specified in the servlet zone properties
file, with each name separated by a special character:
servlet.MonitorAgent.initArgs=servletNames=NameOne|NameTwo
The MonitorAgent class also displays the total number of connections
and the total and free memory available to the Java Virtual Machine.
The class can be extended to display application-specific session
information. For our portal application we count sessions for each
client institution and further break it down by whether the user
is accessing the portal from on or off campus.
Benchmarking JASPer
After we built our cluster, we wanted to find out how well it performed.
In particular, we were interested in seeing how much additional
capacity we gained from the clustering. We also wanted to develop
a benchmark for determining when we might need to add cluster nodes
as usage increased, and to be able to recognize when JASPer began
to reach the point where additional nodes did not increase capacity.
There are several tools available that throw a bunch of requests
at a Web server and come back with a measure of how many requests
the server can handle. We chose ab, the Apache HTTP server
benchmarking tool, because it was easy to use and came free with
the Apache source distribution.
It's important to realize, however, that the resulting number
of requests per second (or similar measure) is artificial, and almost
certainly doesn't correspond to any real-life limit or load.
Most of these tools do not allow you to create a realistic sample
of URLs to simulate real user sessions, particularly when custom
authentication is used to create and maintain the sessions. Also,
different tools will give you different results for the same URL
samples because they have different ways of simulating multiple
users, none of which will correspond exactly to real loads. However,
the numbers are still useful for relative comparisons of different
configurations. For example, by running repeated tests after tweaking
the httpd configuration parameters, you can get a good idea
of what parameter values work best for certain kinds of loads.
Loads are defined in ab by specifying a concurrency level
(corresponding to a number of users who might be making requests
at the same time), a total number of requests, and a single URL
to be used for the requests. The URL can be requested using either
the GET or POST method, and a file can be specified
for the data to be POSTed with the request. ab also
supports HTTP basic authentication and the passing of cookies. However,
there is no way to receive a cookie from the first set of concurrent
requests to be used by subsequent requests to simulate a set of
user sessions that log in for the first request but not for subsequent
requests.
Thus, the URL you use with ab must be carefully chosen
to provide the most useful benchmarking data. For JASPer we wanted
to use a URL that would fully exercise the cluster, so we selected
one that did more work than average requests. We chose a request
that authenticated the benchmark program's IP address and created
a new user session each time, before building and returning a custom
menu page. Also, to minimize other factors that affect response
time, but are not directly related to cluster performance, we ran
ab on the master node while the system was not doing any
other work.
We ran that benchmark test with two load configurations. To simulate
light loads, we specified 10 requests with a concurrency level of
two as if two concurrent users were making 5 requests each. To simulate
a higher load, we specified 500 requests and concurrency level 50,
or 50 users making 10 requests each. These tests were run with JASPer
configured with one, two, three, and four cluster nodes.
The results are illustrated in Figure 3. Under the light load,
performance increased relatively little as the number of cluster
nodes increased. Two nodes could process about 30% more requests
per second than one, and the addition of the third (25% more requests/second)
and fourth (8%) nodes provided even less additional capacity. However,
under the higher load, JASPer capacity scaled very well as nodes
were added to the cluster.
In fact, the addition of each node in the higher load tests increased
capacity by about 110% of the requests per second that a single
node could handle. This "better than linear" scaling is
due to an unexpected benefit of clustering that was revealed by
these tests: When ab simulated 50 users making concurrent
requests, one- and two-node configurations could not handle all
the requests without errors. 62 out of 500 requests failed when
JASPer had one node, creating an abnormally low requests-per-second
rate compared to the other configurations. With two nodes, two out
of 500 requests failed and there were no failures with the three-
and four-node configurations. When the one-node baseline was normalized
to ignore errors, (6.4 requests per second instead of 5.61), scaling
with additional nodes was about 90%.
These benchmark tests demonstrate that adding additional cluster
nodes to JASPer results in higher capacity and throughput, particularly
under higher Web server loads. Additionally, they indicate that
the current configuration of four nodes has plenty of capacity for
the normal loads our portal application sees. And, as that usage
grows, we should be able to increase capacity by adding additional
nodes. This scalability and performance, along with the fault tolerance
that the JServ load balancing mechanism provides, are the key benefits
of clustering. With JASPer, we have been able to realize these benefits
without investing a lot of money and effort into setting up and
administering the cluster.
Don Gourley works at the Washington Research Library Consortium
where he develops and administers Web-based information systems
for academic libraries. Don can be reached at: gourley@wrlc.org.
|