Linux
as an Application Server -- The Tomcat Way
Chris Bush
With the birth of the World Wide Web, systems administrators
suddenly found themselves with another kind of specialized system
to manage -- the Web server. In those early days, the Web server
was almost always a UNIX-based system. With the attention being
paid today to Open Source software like Linux and Apache, and despite
the market muscle of Microsoft, Web servers are still predominantly
UNIX-based. Previously, Web servers didn't always get the same
attention in the corporate data center as the mainframe, the database
servers, or even the file and print servers. However, Web servers
are here to stay and have added their own complexities to the job
of the systems administrator.
Early Web servers typically hosted "brochure-ware" corporate
Web sites or intranets whose most sophisticated functionality was
job-posting systems or corporate phone directories, implemented
using CGI (Common Gateway Interface) programs. As the Web grew,
and the technologies driving it matured and expanded, more Web servers
became host to increasingly critical applications, like e-storefronts,
information portals, customer service applications, B2B (business-to-business)
portals, and others. As these applications became more prevalent,
it became important to segregate application architecture into more
than just a front-office tier and a back-office tier. A third layer
in the client-server architecture -- a middle-office tier to
implement business-logic separate from the database and the user
interface was necessary
CGI seemed to offer a reasonable option for this middle-office
tier for Web-based applications, but the presentation code (HTML),
was always interspersed throughout the code that implemented the
business rules and data access central to the application's
functionality. This did not provide the clean separation between
application layers desirable in a multi-tiered client-server architecture.
Furthermore, Web servers and CGI scripts did not offer a number
of other features required for building scalable, robust, and secure
applications. There was no consistently available mechanism for
implementing session and state management, transaction management,
clustering, load balancing, or security and personalization. Applications
implemented with just a database, a Web server, and some CGI programs
simply could not support the demands of being used by thousands
of people. Nor did they offer the application developer a useful
way to build cleanly segregated three-tier applications.
To solve these issues, the concept of the application server arose.
Oh great, just add another server to the already vast array of machines
for which the systems administrator is responsible. Well, you can
relax (a little), because often the application server is a logical
layer in the three-tier application architecture, existing simply
as an additional suite of software and services right on the Web
server.
An application server (sometimes called middleware) is a set of
software and services that allow for the development and deployment
of the application code in three-tier applications. In a typical
three-tier architecture, the application tier implements the business
rules of the application, segregating them from the database and
the presentation code. A business rule is something that governs
the processing logic of a program. Examples might be something like
"the order total may not exceed the customer's available
credit", or "shipping charges are not applied to orders
over $200.00". The application server provides the development
and run-time environments for the application tier.
Ultimately, however, it's not so simple. An application server
means different things to different people. I will distinguish these
differences, and the technologies they introduce, by breaking them
into two categories -- the Web application server and enterprise
application server.
Web application servers offer an alternative to CGI scripting
for building dynamic, database-driven Web applications -- with
a few enhancements. They offer the ability to code application logic,
interface with databases, manage user sessions and application state,
and typically have functionality for implementing some basic security
and membership. The Web application server is usually tightly integrated
with the HTTP server, allowing scripts or other application code
to run as separate threads in the same process as the HTTP daemon.
This alleviates many of the scalability problems inherent to CGI
scripting, which required a separate process to be spawned for every
request to the script.
Another feature common to Web application server platforms is
the ability to embed blocks of code directly in specially named
HTML files. When the HTTP server gets a request for a file with
a specific extension, the file is first run through an interpreter
that executes those script blocks before a response is sent back
to the browser. These scripting environments provide a programming
model, or API, through which the script can interact directly with
the browser using the HTTP request/response model. Examples of this
kind of Web application server are the Open Source PHP, Microsoft's
ASP, Allaire's ColdFusion, and Java Server Pages (JSP). PHP,
ASP, and JSP offer the ability to embed script blocks into your
HTML files.
This type of programming model can be extremely beneficial during
application development, allowing designers to build the look and
feel, graphical layout, and design, all largely independent from
the coding of the application logic to be implemented. The coding
language in these blocks varies with each vendor's offering.
With ASP, it can be VBScript, JScript, or even Perlscript. PHP uses
its own language, while JSP scripts are written in Java. A technology
intimately related to JSP, Java servlets, will be discussed later
when I focus on the Tomcat Web application server. Allaire's
ColdFusion is a little different in that the program code resembles
an extension to HTML -- basically a whole suite of "markup"
tags that enable much of the same programming capabilities as the
others. All of these Web application server environments are currently
available for the Linux platform. In the case of ASP, though, you
must turn to non-Microsoft solutions from ChiliSoft, Halcyon, or
the Open Source Apache module, Apache::ASP. See the sidebar "Web
Application Servers" for details.
The second category of application servers are commonly referred
to as Enterprise Application Servers. These typically extend the
Web application server model by providing more robust security,
transaction management, message queuing, clustering with automated
fail over, load balancing, and most importantly, an architecture
for deploying distributed application components that implement
business logic. The ability to distribute these components to dedicated
servers is paramount to the scalability of applications, as dedicated
servers can be devoted to application processing, allowing the Web
server to do what it does best -- respond to HTTP requests from
the browser. I won't get into any further detail on enterprise
application servers in this article, except to say that this is
typically the realm of expensive, commercial solutions, which in
the past eliminated Linux as a potential platform. However, although
still not Open Source, many of the application servers are becoming
available for Linux. This is due in large part to their foundations
in the Java environment. The platform-independent nature of Java
lends itself to ease of porting these servers to Linux. The sidebar
"Web Application Servers" also lists a handful of Enterprise
Application Servers available for Linux.
In this article, I'll cover a specific Web application server
environment -- Tomcat. Tomcat is part of the Jakarta project,
which is part of the Apache Software Foundation. Tomcat is the reference
implementation of the Java Servlet 2.2 specification and JSP 1.1
specification. This implementation was recently turned over to the
Apache Software Foundation by Sun Microsystems -- the originators
of the Java language. The neat thing about Tomcat (and servlets
and JSP) is that the platform-independent nature of Java means your
servlets and JSP scripts are portable to any servlet/JSP implementation
of the appropriate specification. In fact, the Tomcat implementation,
itself written in Java, is platform-independent. I downloaded and
installed Tomcat to Red Hat Linux 6.1 and Windows 2000 from the
exact same distribution of the software. All of the servlets and
JSP scripts I've written to date work equally well on both.
In the case of the servlets, which are compiled Java class files,
I deploy the compiled class file to each platform, rather than recompiling
from the Java source. Powerful stuff indeed.
So What Is Tomcat, Exactly?
Tomcat is the official reference implementation of both the Java
servlet 2.2 specification as well as the JSP 1.1 specification.
Tomcat is freely available from the Apache Software Foundation's
Jakarta project at: http://jakarta.apache.org.
Basically, Tomcat implements a run-time environment, called a
container, in which Java servlets and JSP's can execute and
interact with the browser via HTTP. Tomcat can be integrated with
Web servers like Apache, and even IIS, allowing the Web server to
handle requests for static Web pages and images, and pass requests
for servlets and JSPs along to Tomcat for execution. A servlet (or
JSP) may be requested as the result of submitting an HTML form to
the Web server. The data submitted from the form may be used by
the servlet to query or update a database, generate an email, or
register a user on the Web site, to name just a few possibilities.
A hyperlink on a page may also be used to call a servlet or JSP
as well, for similar purposes. Tomcat also implements a simple HTTP
server of its own, which comes in handy for development and testing
separately from your Web server.
Before we get into installing and configuring Tomcat, I'd
like to talk a little about what servlets and JSP scripts are, and
why you might use one or the other.
Servlets
Tomcat implements a runtime environment, called a container, for
Java servlet execution. A servlet is a Java program that bears some
similarity to a CGI program or script. It interacts with requests
from browsers, and with databases or other external applications,
then delivers results to the browser by formulating the HTTP response.
Like a CGI program, a servlet can accept data from the user, such
as that sent from an HTML form or in the query string data portion
of a URL. Other information that is part of a typical HTTP request,
particularly in the HTTP header, can be retrieved as well. This
might include such things as cookies, browser information, host
name or IP address of the connecting computer, and more. The servlet
implements a special kind of Java class that can work with all of
this information, process it, access databases and other applications,
and deliver an HTTP response back to the browser. The servlet can
also manipulate the header parameters of the initiated response,
to set cookie values, control caching, specify the MIME type of
the returned document, and more.
Servlets offer some definite advantages over CGI scripts for the
Web developer. For example, servlets, being Java classes, are platform-independent.
For all its portability across platforms, Perl cannot boast the
cross-platform portability of Java. As for CGI programs written
in C, the effort involved to achieve source-level portability can
be significant, and you can forget about cross-platform portability
of compiled object code. CGI's primary failing, though, is
an inability to scale on high-traffic sites. Every time a CGI program
is accessed, whether by the same user or a new user, a new process
is spawned by the Web server, and the HTTP request information is
sent to that process. If that program performs database accesses,
each process instance must establish its own database connection
-- an expensive operation. While the mod_perl Apache module,
and FastCGI, have done much to improve this situation, Java servlets
offer a much better alternative.
When a Java servlet runs, a single instance of that servlet class,
running within the Java Virtual Machine (JVM), is created to handle
all requests. The overhead of creating the object and starting a
thread of execution within the JVM doesn't even compare to
the overhead of CGI processing. With a servlet, each request made
to it simply results in a new thread of execution within a single
running instance of that servlet class, resulting in very little
additional overhead. If part of the servlet's processing needs
to be thread-safe (such as with critical database operations), Java
provides built-in support for thread synchronization. The servlet
container also implements a complete object model, or API, for interacting
with HTTP requests and responses, providing session state, cookie
management, pooling of database connections, URL manipulation, and
more.
Java also offers some security advantages, because it inherently
protects against common programming errors that can lead to security
problems. Java automatically provides for array bounds checking,
and doesn't allow arithmetic operations on pointers (references).
This helps prevent the type of programming errors that lead to buffer
overrun related exploitations. Java also will be free from the concerns
of CGI scripts implemented in shell scripting languages, which suffer
from vulnerabilities due to shell meta-characters being passed as
part of the HTTP request.
Listing 1 shows an example of a Java servlet. When this code is
compiled into a Java class file, it can be deployed to a servlet
container and run. I'll show how to do that later. Over the
years, I have found myself repeatedly creating Web-based interfaces
to common systems administration tasks, such as monitoring disk
usage, managing services like lpd and network license managers,
and even DNS. Typically, I would have static HTML pages with a bunch
of links to CGI scripts that performed these tasks, and formatted
the results using HTML. In Listing 1, I am providing an example
of performing one such task using a servlet instead. This particular
servlet runs the Linux df command and formats the output
with HTML for presentation to the browser. Part of that reformatting
will cause file systems that have exceeded a defined threshold (percentage
used) to be highlighted in red; others will be highlighted in green.
(All listings for this article are available from the Sys Admin
Web site: http://www.sysadminmag.com.)
Java Server Pages
Java Server Pages, or JSP "scripts", allow you to place
small blocks of Java code right into your formerly static HTML pages.
This makes JSP very similar to environments like PHP or ASP --
with the primary advantages being the portability of Java and the
robust API described in the JSP specification. In fact, JSP has
all the capabilities of Java servlets, but can be more convenient
to write for pages with small amounts of program code and large
amounts of static HTML. JSP scripts are actually converted automatically
into servlets by Tomcat when they are first referenced, and compiled
and instantiated in the servlet container.
Listing 2 shows an example of a JSP "script", or page,
that duplicates the functionality of the servlet in Listing 1. Even
if you know nothing about Java, you can see that the "meat"
of the code is unchanged.
Installing and Configuring Tomcat
Before you install Tomcat, you will need an appropriate Java run-time
environment (JRE). A full development environment is necessary if
you'll be writing and compiling your own Java servlets. The
Java Development Kit, available for free from Sun Microsystems (http://java.sun.com),
is sufficient for both needs. As of this writing, there is a JDK
version 1.3, which is in a Beta Refresh release for Linux. I have
used this with Tomcat with no problems. I won't discuss the
installation of the JDK here, but the Linux version is available
in a Red Hat "RPM" package, so installation is straightforward.
The Tomcat software can be found at: http://jakarta.apache.org.
The current release version is 3.1, with a 3.2 in beta. Version
3.2 fixes a number of problems from 3.1, and I have been using it
for my development. Because Tomcat itself is written in Java, it
is platform-independent, so there are no platform-specific distributions.
The available Tomcat version will likely have changed by the time
you read this, so just go to:
http://jakarta.apache.org/downloads/index.html
There you can download a zip file of the current Tomcat release. Select
the link for downloading binaries. After downloading the zip file,
change your directory to the parent directory you'd like to contain
Tomcat, and extract the archived distribution. For example:
% cd /usr/local % gunzip -c jakarta-tomcat.tar.gz | tar -xvf -
Then, you simply need to set a pair of environment variables, start
the Tomcat services, and you're ready to start serving up Java
servlets and Java Server Pages. You will need to set an environment
variable TOMCAT_HOME to the root of the Tomcat installation:
% setenv TOMCAT_HOME /usr/local/jakarta-tomcat
and JAVA_HOME to the root of your JDK:
% setenv JAVA_HOME /usr/java/jdk1.3
To start Tomcat:
% cd /usr/local/jakarta-tomcat % ./bin/startup.sh
Tomcat's default configuration starts up a standalone HTTP server
listening on TCP port 8080, along with the listener for the servlet
container on port 8007. To test your installation, start up your browser
and enter the URL:
http://localhost:8080
You should see the default Tomcat home page, which has links to some
servlet and JSP examples, as well as some useful documentation. Test
some of these to make sure everything is working well. The vast array
of configuration details for Tomcat are beyond the introductory scope
of this article. Please refer to:
http://jakarta.apache.org/tomcat/jakarta-tomcat/src/doc/uguide/tomcat-ug.html
for more complete information. This guide is also provided in the
Tomcat distribution under $TOMCAT_HOME/doc/uguide.
I will cover some of the basics of Tomcat configuration, including
configuring a new Tomcat Web application. I will also cover integrating
Tomcat with Apache, so that Apache can be used to serve up your
static HTML, while the Tomcat servlet container handles servlet
and JSP requests and responses. The Tomcat HTTP server is there
to facilitate development and testing of your installation, and
is not suitable as a large-scale standalone Web server. Apache is!
I'll start by setting up Apache and Tomcat to work together.
Java servlets and JSPs run in what is known as a container, which
is implemented by Tomcat. The goal is to configure Apache to pass
requests for servlets or JSP scripts along to the Tomcat servlet
container, and serve up requests for static HTML pages and image
files itself. Achieving this goal is fairly simple. Tomcat includes
a configuration file that you can include in your Apache configuration
file, httpd.conf. This can be found at: $TOMCAT_HOME/conf/tomcat-apache.conf.
I simply added an include directive to the end of my Apache configuration
file, /etc/httpd/conf/httpd.conf:
include /usr/local/jakarta-tomcat/conf/tomcat-apache.conf
I'll discuss what happens in this included file, line by line.
The tomcat-apache.conf file appears in Listing 3.
The first line loads a dynamically loadable Apache module, mod_jserv.so,
which provides the capability to map servlet and JSP requests to
the Tomcat servlet container. You'll need to download mod_jserv.so
from the Jakarta/Tomcat Web site. In the same directory from which
you downloaded your Tomcat distribution (i.e., jakarta-tomcat.tar.gz),
there should be a subdirectory called "linux", in which
there is another subdirectory, "i386". In there, you should
find the mod_jserv.so file.
After the LoadModule directive is a series of configurations
for the Jserv module. You should not need to change these. Next,
the ApJServDefaultPort sets the default TCP port that Jserv uses
to communicate with the Tomcat container. The AddType directive
associates the .jsp file extension with the MIME type text/jsp,
and the AddHandler directive tells Apache to use the Tomcat
servlet container for handling JSP pages. Following this are three
similar sections, but I'll cover the first one, which sets
up the Tomcat /examples Web application containing the servlet
and JSP examples we saw earlier. This Apache Alias
directive associates URLs that reference paths beginning with "/examples",
with the physical directory at /usr/local/Jakarta-tomcat/webapps/
examples.
The <Directory> section is another Apache directive,
telling Apache to allow directory indexing and following of symbolic
links. The ApJServMount directive is a Jserv-specific directive,
instructing Apache that URLs beginning with the path /examples/servlet
are references to Java servlets in the /examples Web application
context. Finally, the <Location> section tells Apache
to deny access to a directory called WEB-INF in the examples
Web application. This is a special directory that Tomcat uses to
store configurations and the Java class files implementing each
servlet. There's no need for this to be directly browsable
by the end user. The remainder of the tomcat-apache.conf
file is very similar to the section for the examples Web
application. Next, I'll show how to add your own Web application
to Tomcat.
When you develop Web sites using a Java application server like
Tomcat, you create Web applications. A Web application is a collection
of HTML files, images, sound files, or other media, along with Java
servlets and Java Server Pages, that are deployed together. When
you are using Apache, you also set up a virtual directory in your
Apache configuration corresponding with the location of your Web
application. With Tomcat, this is pretty simple, and some of the
configuration is automated.
You'll need to make a simple change to Tomcat's main
configuration file, $TOMCAT_HOME/conf/server.xml. This file
is explained in the user's guide, and I won't cover all
of the configuration settings here. Instead, I'll touch on
one key piece, the Context configuration. A Tomcat context specifies
a path where a Web application will exist. This makes it similar
to the Apache Alias directive. Looking at the server.xml
file, you will see things like:
<Context path="/examples"
docBase="webapps/examples"
debug="0"
reloadable="true" >
</Context>
The path attribute specifies the path in the URL that will
refer to this Web application. The docBase attribute specifies
a path in the file system where this application is found. This can
either be an absolute path or a path relative to the Tomcat Context
Manager (which by default is $TOMCAT_HOME/webapps). The debug
attribute specifies the level of debug logging messages, and the reloadable
attribute specifies whether Tomcat will reload a servlet automatically
when it is changed.
To create the sample Web application, add the following Context
to the server.xml file:
<Context path="/windmill"
docBase="/usr/local/webapps/windmill"
debug="0"
reloadable="true" >
</Context>
My sample Web application, /windmill, exists in the absolute
path /usr/local/webapps/windmill. I will put all of my HTML,
images, servlets, and JSPs here. After adding this section to server.xml,
I restarted the Tomcat service. When Tomcat starts, it generates the
tomcat-apache.conf file automatically, in part from what's
found in server.xml. Specifically, Apache and Jserv directives
similar to those for the /examples Web application are created
for my new /windmill application. Looking at this newly generated
tomcat-apache.conf file, I see the following additional lines:
Alias /windmill "/usr/local/webapps/windmill"
<Directory "/usr/local/webapps/windmill">
Options Indexes FollowSymLinks
</Directory>
ApJServMount /windmill/servlet /windmill
<Location "/windmill/WEB-INF/">
AllowOverride None
deny from all
</Location>
To complete creation of this Web application, I created the document
root directory, /usr/local/webapps/windmill, as specified in
the Alias directive. Under that directory, I created the special Tomcat
directory, WEB-INF, where the configuration file for my application,
web.xml, will be, along with another subdirectory, WEB-INF/classes,
which I created to hold my servlet class files. HTML pages, image
files, and JSPs (.jsp files) can go anywhere under /usr/local/webapps/windmill.
After creating this directory structure, you should restart both Tomcat
and Apache, so that they both recognize and can access the new application.
After setting up the new application, I wanted to compile and
install the example servlet from Listing 1. I typed the code into
a file called myDF.java, and compiled with the command:
% javac myDF.java
When compiled successfully, this creates a file called myDF.class.
I copy this file to /usr/local/webapps/windmill/WEB-INF/classes,
and I can call this servlet with the URL:
http://rocinante/windmill/servlet/myDF
I also typed the code for the JSP in Listing 2 into the file /usr/local/webapps/windmill/myDF.jsp,
which I can then access with the URL:
http://rocinante/windmill/myDF.jsp
That's a very simple example of setting up a new Web application
with Tomcat. There are many other configuration options, especially
with the application's configuration file, WEB-INF/web.xml.
Among the options are the ability to set initialization parameters
for servlets, configure security, set MIME types, and more. You can
also add index.jsp to the DirectoryIndex directive in
your Apache configuration. This directive may be found in /etc/httpd/conf/srm.conf,
depending upon your individual Apache setup. After doing this, my
DirectoryIndex configuration looks like:
DirectoryIndex index.html index.shtml index.cgi index.jsp
Restart Apache after making this change, and you can now use a JSP
called index.jsp as your default page in directories.
Deploying an application server in your environment, whether logically
or on a physically dedicated server, offers challenges beyond whether
it is Open Source or commercial software. You must be sure that
you deploy and configure the software in accordance with all the
policies and procedures you already have in place. This is true
even if you haven't formalized those policies and procedures,
and they exist simply as best practices and your own expertise in
your field.
As with any application that runs some form of network service,
one important step is to make sure that no unnecessary services
are used. By default, Tomcat runs both the servlet container connector
(listening on TCP port 8007) and an HTTP server (on port 8080).
If you install the server adapter for Apache, you don't need
Tomcat's HTTP server. Turn it off by removing or commenting
out the HTTP connection handler from $TOMCAT_HOME/conf/server.xml.
It should look something like this:
<!-- Normal HTTP -->
<Connector className="org.apache.tomcat.service.PoolTcpConnector">
<Parameter name="handler"
value="org.apache.tomcat.service.http.HttpConnectionHandler"/>
<Parameter name="port"
value="8080"/>
</Connector>
This is an XML file, so it uses the same comment delimiters as HTML.
Simply move the trailing comment delimiter (-->) from the
end of the first line above, to a line after the ending </Connector>
tag.
Another critical issue with any server daemon is to make sure
it runs with no more privileges than are necessary. Your Apache
HTTP daemons probably run as an unprivileged user, like "nobody".
This can help prevent users from accessing sensitive files if they
should happen to exploit some bug in the server (such as a buffer
overflow bug).
If you ran Tomcat's startup script as root, both the Tomcat
servlet connector and the HTTP daemon are running with root privileges.
Further, if you leave that HTTP service running and never use it,
you now have an unused service running as root. This could be exploited,
and you may not notice for a long time.
Since you'll probably want to start the Tomcat application
server at boot time, rather than have to start it manually every
time, I have included a simple script you can add to your run control
environment, typically in the /etc/rc.d/init.d directory.
(See Listing 4.) This script will start Tomcat as an unprivileged
user (e.g., user name "nobody"). By putting this script
in /etc/rc.d/init.d and creating a symbolic link such as
"S97tomcat" to it in /etc/init.d/rc5.d,
you can start Tomcat services at boot time, without their assuming
root privileges. This configuration is for Red Hat 6.1. You should
adjust as necessary to suit your flavor of UNIX.
Because Tomcat needs to know some things about its environment
to run properly, you should also add the following environment variable
settings to the startup.sh and shutdown.sh files called
by the init script in Listing 4.
JAVA_HOME=/usr/java/jdk1.3
TOMCAT_HOME=/usr/local/jakarta-tomcat
CLASSPATH=.:$JAVA_HOME/lib/tools.jar
export JAVA_HOME TOMCAT_HOME CLASSPATH
Change the values of JAVA_HOME and TOMCAT_HOME to reflect
your installation and place these lines in startup.sh and shutdown.sh
before the lines that read:
BASEDIR=`dirname $0`
$BASEDIR/tomcat.sh start "$@"
Another task of the systems administrator is monitoring and managing
log files. It should come as no surprise that this raises security
implications as well. Monitoring log files allows you to separate
suspicious activity from normal activity and to distinguish system
or application failure-related activity. Log file rotation is also
critical. A rogue log file that fills up a file system can cripple
your mission-critical applications. Tomcat's log files are located
in $TOMCAT_HOME/logs. Make them a part of your routine, automated
or otherwise, of monitoring and rotation/archival of your systems
logs.
Conclusion
Application servers come in many flavors, and regardless of how
you look at it, the options for Linux environments are growing rapidly.
Most of this growth appears to be in the so-called Enterprise Application
Server market, where Java seems to be well positioned to dominate
this market. This article only scratches the surface of application
server technology, as it exposes just the reference implementation
of a Java servlet and JSP container.
Chris Bush is a Senior Consultant with marchFIRST, Inc., specializing
in middle-tier development on an endless variety of e-commerce projects.
|