Article
Sidebar

jan2001.tar

Linux as an Application Server -- The Tomcat Way

Chris Bush

With the birth of the World Wide Web, systems administrators suddenly found themselves with another kind of specialized system to manage -- the Web server. In those early days, the Web server was almost always a UNIX-based system. With the attention being paid today to Open Source software like Linux and Apache, and despite the market muscle of Microsoft, Web servers are still predominantly UNIX-based. Previously, Web servers didn't always get the same attention in the corporate data center as the mainframe, the database servers, or even the file and print servers. However, Web servers are here to stay and have added their own complexities to the job of the systems administrator.

Early Web servers typically hosted "brochure-ware" corporate Web sites or intranets whose most sophisticated functionality was job-posting systems or corporate phone directories, implemented using CGI (Common Gateway Interface) programs. As the Web grew, and the technologies driving it matured and expanded, more Web servers became host to increasingly critical applications, like e-storefronts, information portals, customer service applications, B2B (business-to-business) portals, and others. As these applications became more prevalent, it became important to segregate application architecture into more than just a front-office tier and a back-office tier. A third layer in the client-server architecture -- a middle-office tier to implement business-logic separate from the database and the user interface was necessary

CGI seemed to offer a reasonable option for this middle-office tier for Web-based applications, but the presentation code (HTML), was always interspersed throughout the code that implemented the business rules and data access central to the application's functionality. This did not provide the clean separation between application layers desirable in a multi-tiered client-server architecture. Furthermore, Web servers and CGI scripts did not offer a number of other features required for building scalable, robust, and secure applications. There was no consistently available mechanism for implementing session and state management, transaction management, clustering, load balancing, or security and personalization. Applications implemented with just a database, a Web server, and some CGI programs simply could not support the demands of being used by thousands of people. Nor did they offer the application developer a useful way to build cleanly segregated three-tier applications.

To solve these issues, the concept of the application server arose. Oh great, just add another server to the already vast array of machines for which the systems administrator is responsible. Well, you can relax (a little), because often the application server is a logical layer in the three-tier application architecture, existing simply as an additional suite of software and services right on the Web server.

An application server (sometimes called middleware) is a set of software and services that allow for the development and deployment of the application code in three-tier applications. In a typical three-tier architecture, the application tier implements the business rules of the application, segregating them from the database and the presentation code. A business rule is something that governs the processing logic of a program. Examples might be something like "the order total may not exceed the customer's available credit", or "shipping charges are not applied to orders over $200.00". The application server provides the development and run-time environments for the application tier.

Ultimately, however, it's not so simple. An application server means different things to different people. I will distinguish these differences, and the technologies they introduce, by breaking them into two categories -- the Web application server and enterprise application server.

Web application servers offer an alternative to CGI scripting for building dynamic, database-driven Web applications -- with a few enhancements. They offer the ability to code application logic, interface with databases, manage user sessions and application state, and typically have functionality for implementing some basic security and membership. The Web application server is usually tightly integrated with the HTTP server, allowing scripts or other application code to run as separate threads in the same process as the HTTP daemon. This alleviates many of the scalability problems inherent to CGI scripting, which required a separate process to be spawned for every request to the script.

Another feature common to Web application server platforms is the ability to embed blocks of code directly in specially named HTML files. When the HTTP server gets a request for a file with a specific extension, the file is first run through an interpreter that executes those script blocks before a response is sent back to the browser. These scripting environments provide a programming model, or API, through which the script can interact directly with the browser using the HTTP request/response model. Examples of this kind of Web application server are the Open Source PHP, Microsoft's ASP, Allaire's ColdFusion, and Java Server Pages (JSP). PHP, ASP, and JSP offer the ability to embed script blocks into your HTML files.

This type of programming model can be extremely beneficial during application development, allowing designers to build the look and feel, graphical layout, and design, all largely independent from the coding of the application logic to be implemented. The coding language in these blocks varies with each vendor's offering. With ASP, it can be VBScript, JScript, or even Perlscript. PHP uses its own language, while JSP scripts are written in Java. A technology intimately related to JSP, Java servlets, will be discussed later when I focus on the Tomcat Web application server. Allaire's ColdFusion is a little different in that the program code resembles an extension to HTML -- basically a whole suite of "markup" tags that enable much of the same programming capabilities as the others. All of these Web application server environments are currently available for the Linux platform. In the case of ASP, though, you must turn to non-Microsoft solutions from ChiliSoft, Halcyon, or the Open Source Apache module, Apache::ASP. See the sidebar "Web Application Servers" for details.

The second category of application servers are commonly referred to as Enterprise Application Servers. These typically extend the Web application server model by providing more robust security, transaction management, message queuing, clustering with automated fail over, load balancing, and most importantly, an architecture for deploying distributed application components that implement business logic. The ability to distribute these components to dedicated servers is paramount to the scalability of applications, as dedicated servers can be devoted to application processing, allowing the Web server to do what it does best -- respond to HTTP requests from the browser. I won't get into any further detail on enterprise application servers in this article, except to say that this is typically the realm of expensive, commercial solutions, which in the past eliminated Linux as a potential platform. However, although still not Open Source, many of the application servers are becoming available for Linux. This is due in large part to their foundations in the Java environment. The platform-independent nature of Java lends itself to ease of porting these servers to Linux. The sidebar "Web Application Servers" also lists a handful of Enterprise Application Servers available for Linux.

In this article, I'll cover a specific Web application server environment -- Tomcat. Tomcat is part of the Jakarta project, which is part of the Apache Software Foundation. Tomcat is the reference implementation of the Java Servlet 2.2 specification and JSP 1.1 specification. This implementation was recently turned over to the Apache Software Foundation by Sun Microsystems -- the originators of the Java language. The neat thing about Tomcat (and servlets and JSP) is that the platform-independent nature of Java means your servlets and JSP scripts are portable to any servlet/JSP implementation of the appropriate specification. In fact, the Tomcat implementation, itself written in Java, is platform-independent. I downloaded and installed Tomcat to Red Hat Linux 6.1 and Windows 2000 from the exact same distribution of the software. All of the servlets and JSP scripts I've written to date work equally well on both. In the case of the servlets, which are compiled Java class files, I deploy the compiled class file to each platform, rather than recompiling from the Java source. Powerful stuff indeed.

So What Is Tomcat, Exactly?

Tomcat is the official reference implementation of both the Java servlet 2.2 specification as well as the JSP 1.1 specification. Tomcat is freely available from the Apache Software Foundation's Jakarta project at: http://jakarta.apache.org.

Basically, Tomcat implements a run-time environment, called a container, in which Java servlets and JSP's can execute and interact with the browser via HTTP. Tomcat can be integrated with Web servers like Apache, and even IIS, allowing the Web server to handle requests for static Web pages and images, and pass requests for servlets and JSPs along to Tomcat for execution. A servlet (or JSP) may be requested as the result of submitting an HTML form to the Web server. The data submitted from the form may be used by the servlet to query or update a database, generate an email, or register a user on the Web site, to name just a few possibilities. A hyperlink on a page may also be used to call a servlet or JSP as well, for similar purposes. Tomcat also implements a simple HTTP server of its own, which comes in handy for development and testing separately from your Web server.

Before we get into installing and configuring Tomcat, I'd like to talk a little about what servlets and JSP scripts are, and why you might use one or the other.

Servlets

Tomcat implements a runtime environment, called a container, for Java servlet execution. A servlet is a Java program that bears some similarity to a CGI program or script. It interacts with requests from browsers, and with databases or other external applications, then delivers results to the browser by formulating the HTTP response. Like a CGI program, a servlet can accept data from the user, such as that sent from an HTML form or in the query string data portion of a URL. Other information that is part of a typical HTTP request, particularly in the HTTP header, can be retrieved as well. This might include such things as cookies, browser information, host name or IP address of the connecting computer, and more. The servlet implements a special kind of Java class that can work with all of this information, process it, access databases and other applications, and deliver an HTTP response back to the browser. The servlet can also manipulate the header parameters of the initiated response, to set cookie values, control caching, specify the MIME type of the returned document, and more.

Servlets offer some definite advantages over CGI scripts for the Web developer. For example, servlets, being Java classes, are platform-independent. For all its portability across platforms, Perl cannot boast the cross-platform portability of Java. As for CGI programs written in C, the effort involved to achieve source-level portability can be significant, and you can forget about cross-platform portability of compiled object code. CGI's primary failing, though, is an inability to scale on high-traffic sites. Every time a CGI program is accessed, whether by the same user or a new user, a new process is spawned by the Web server, and the HTTP request information is sent to that process. If that program performs database accesses, each process instance must establish its own database connection -- an expensive operation. While the mod_perl Apache module, and FastCGI, have done much to improve this situation, Java servlets offer a much better alternative.

When a Java servlet runs, a single instance of that servlet class, running within the Java Virtual Machine (JVM), is created to handle all requests. The overhead of creating the object and starting a thread of execution within the JVM doesn't even compare to the overhead of CGI processing. With a servlet, each request made to it simply results in a new thread of execution within a single running instance of that servlet class, resulting in very little additional overhead. If part of the servlet's processing needs to be thread-safe (such as with critical database operations), Java provides built-in support for thread synchronization. The servlet container also implements a complete object model, or API, for interacting with HTTP requests and responses, providing session state, cookie management, pooling of database connections, URL manipulation, and more.

Java also offers some security advantages, because it inherently protects against common programming errors that can lead to security problems. Java automatically provides for array bounds checking, and doesn't allow arithmetic operations on pointers (references). This helps prevent the type of programming errors that lead to buffer overrun related exploitations. Java also will be free from the concerns of CGI scripts implemented in shell scripting languages, which suffer from vulnerabilities due to shell meta-characters being passed as part of the HTTP request.

Listing 1 shows an example of a Java servlet. When this code is compiled into a Java class file, it can be deployed to a servlet container and run. I'll show how to do that later. Over the years, I have found myself repeatedly creating Web-based interfaces to common systems administration tasks, such as monitoring disk usage, managing services like lpd and network license managers, and even DNS. Typically, I would have static HTML pages with a bunch of links to CGI scripts that performed these tasks, and formatted the results using HTML. In Listing 1, I am providing an example of performing one such task using a servlet instead. This particular servlet runs the Linux df command and formats the output with HTML for presentation to the browser. Part of that reformatting will cause file systems that have exceeded a defined threshold (percentage used) to be highlighted in red; others will be highlighted in green. (All listings for this article are available from the Sys Admin Web site: http://www.sysadminmag.com.)

Java Server Pages

Java Server Pages, or JSP "scripts", allow you to place small blocks of Java code right into your formerly static HTML pages. This makes JSP very similar to environments like PHP or ASP -- with the primary advantages being the portability of Java and the robust API described in the JSP specification. In fact, JSP has all the capabilities of Java servlets, but can be more convenient to write for pages with small amounts of program code and large amounts of static HTML. JSP scripts are actually converted automatically into servlets by Tomcat when they are first referenced, and compiled and instantiated in the servlet container.

Listing 2 shows an example of a JSP "script", or page, that duplicates the functionality of the servlet in Listing 1. Even if you know nothing about Java, you can see that the "meat" of the code is unchanged.

Installing and Configuring Tomcat

Before you install Tomcat, you will need an appropriate Java run-time environment (JRE). A full development environment is necessary if you'll be writing and compiling your own Java servlets. The Java Development Kit, available for free from Sun Microsystems (http://java.sun.com), is sufficient for both needs. As of this writing, there is a JDK version 1.3, which is in a Beta Refresh release for Linux. I have used this with Tomcat with no problems. I won't discuss the installation of the JDK here, but the Linux version is available in a Red Hat "RPM" package, so installation is straightforward.

The Tomcat software can be found at: http://jakarta.apache.org. The current release version is 3.1, with a 3.2 in beta. Version 3.2 fixes a number of problems from 3.1, and I have been using it for my development. Because Tomcat itself is written in Java, it is platform-independent, so there are no platform-specific distributions. The available Tomcat version will likely have changed by the time you read this, so just go to:

http://jakarta.apache.org/downloads/index.html

There you can download a zip file of the current Tomcat release. Select the link for downloading binaries. After downloading the zip file, change your directory to the parent directory you'd like to contain Tomcat, and extract the archived distribution. For example:

% cd /usr/local
% gunzip -c jakarta-tomcat.tar.gz | tar -xvf -

Then, you simply need to set a pair of environment variables, start the Tomcat services, and you're ready to start serving up Java servlets and Java Server Pages. You will need to set an environment variable TOMCAT_HOME to the root of the Tomcat installation:

% setenv TOMCAT_HOME /usr/local/jakarta-tomcat

and JAVA_HOME to the root of your JDK:

% setenv JAVA_HOME /usr/java/jdk1.3

To start Tomcat:

% cd /usr/local/jakarta-tomcat
% ./bin/startup.sh

Tomcat's default configuration starts up a standalone HTTP server listening on TCP port 8080, along with the listener for the servlet container on port 8007. To test your installation, start up your browser and enter the URL:

http://localhost:8080

You should see the default Tomcat home page, which has links to some servlet and JSP examples, as well as some useful documentation. Test some of these to make sure everything is working well. The vast array of configuration details for Tomcat are beyond the introductory scope of this article. Please refer to:

http://jakarta.apache.org/tomcat/jakarta-tomcat/src/doc/uguide/tomcat-ug.html

for more complete information. This guide is also provided in the Tomcat distribution under $TOMCAT_HOME/doc/uguide.

I will cover some of the basics of Tomcat configuration, including configuring a new Tomcat Web application. I will also cover integrating Tomcat with Apache, so that Apache can be used to serve up your static HTML, while the Tomcat servlet container handles servlet and JSP requests and responses. The Tomcat HTTP server is there to facilitate development and testing of your installation, and is not suitable as a large-scale standalone Web server. Apache is!

I'll start by setting up Apache and Tomcat to work together. Java servlets and JSPs run in what is known as a container, which is implemented by Tomcat. The goal is to configure Apache to pass requests for servlets or JSP scripts along to the Tomcat servlet container, and serve up requests for static HTML pages and image files itself. Achieving this goal is fairly simple. Tomcat includes a configuration file that you can include in your Apache configuration file, httpd.conf. This can be found at: $TOMCAT_HOME/conf/tomcat-apache.conf.

I simply added an include directive to the end of my Apache configuration file, /etc/httpd/conf/httpd.conf:

include /usr/local/jakarta-tomcat/conf/tomcat-apache.conf

I'll discuss what happens in this included file, line by line. The tomcat-apache.conf file appears in Listing 3.

The first line loads a dynamically loadable Apache module, mod_jserv.so, which provides the capability to map servlet and JSP requests to the Tomcat servlet container. You'll need to download mod_jserv.so from the Jakarta/Tomcat Web site. In the same directory from which you downloaded your Tomcat distribution (i.e., jakarta-tomcat.tar.gz), there should be a subdirectory called "linux", in which there is another subdirectory, "i386". In there, you should find the mod_jserv.so file.

After the LoadModule directive is a series of configurations for the Jserv module. You should not need to change these. Next, the ApJServDefaultPort sets the default TCP port that Jserv uses to communicate with the Tomcat container. The AddType directive associates the .jsp file extension with the MIME type text/jsp, and the AddHandler directive tells Apache to use the Tomcat servlet container for handling JSP pages. Following this are three similar sections, but I'll cover the first one, which sets up the Tomcat /examples Web application containing the servlet and JSP examples we saw earlier. This Apache Alias directive associates URLs that reference paths beginning with "/examples", with the physical directory at /usr/local/Jakarta-tomcat/webapps/ examples.

The <Directory> section is another Apache directive, telling Apache to allow directory indexing and following of symbolic links. The ApJServMount directive is a Jserv-specific directive, instructing Apache that URLs beginning with the path /examples/servlet are references to Java servlets in the /examples Web application context. Finally, the <Location> section tells Apache to deny access to a directory called WEB-INF in the examples Web application. This is a special directory that Tomcat uses to store configurations and the Java class files implementing each servlet. There's no need for this to be directly browsable by the end user. The remainder of the tomcat-apache.conf file is very similar to the section for the examples Web application. Next, I'll show how to add your own Web application to Tomcat.

When you develop Web sites using a Java application server like Tomcat, you create Web applications. A Web application is a collection of HTML files, images, sound files, or other media, along with Java servlets and Java Server Pages, that are deployed together. When you are using Apache, you also set up a virtual directory in your Apache configuration corresponding with the location of your Web application. With Tomcat, this is pretty simple, and some of the configuration is automated.

You'll need to make a simple change to Tomcat's main configuration file, $TOMCAT_HOME/conf/server.xml. This file is explained in the user's guide, and I won't cover all of the configuration settings here. Instead, I'll touch on one key piece, the Context configuration. A Tomcat context specifies a path where a Web application will exist. This makes it similar to the Apache Alias directive. Looking at the server.xml file, you will see things like:

<Context path="/examples"
               docBase="webapps/examples"
               debug="0"
               reloadable="true" >
</Context>

The path attribute specifies the path in the URL that will refer to this Web application. The docBase attribute specifies a path in the file system where this application is found. This can either be an absolute path or a path relative to the Tomcat Context Manager (which by default is $TOMCAT_HOME/webapps). The debug attribute specifies the level of debug logging messages, and the reloadable attribute specifies whether Tomcat will reload a servlet automatically when it is changed.

To create the sample Web application, add the following Context to the server.xml file:

<Context path="/windmill"
               docBase="/usr/local/webapps/windmill"
               debug="0"
               reloadable="true" >
 </Context>

My sample Web application, /windmill, exists in the absolute path /usr/local/webapps/windmill. I will put all of my HTML, images, servlets, and JSPs here. After adding this section to server.xml, I restarted the Tomcat service. When Tomcat starts, it generates the tomcat-apache.conf file automatically, in part from what's found in server.xml. Specifically, Apache and Jserv directives similar to those for the /examples Web application are created for my new /windmill application. Looking at this newly generated tomcat-apache.conf file, I see the following additional lines:

Alias /windmill "/usr/local/webapps/windmill"
<Directory "/usr/local/webapps/windmill">
    Options Indexes FollowSymLinks
</Directory>
ApJServMount /windmill/servlet /windmill
<Location "/windmill/WEB-INF/">
    AllowOverride None
    deny from all
</Location>

To complete creation of this Web application, I created the document root directory, /usr/local/webapps/windmill, as specified in the Alias directive. Under that directory, I created the special Tomcat directory, WEB-INF, where the configuration file for my application, web.xml, will be, along with another subdirectory, WEB-INF/classes, which I created to hold my servlet class files. HTML pages, image files, and JSPs (.jsp files) can go anywhere under /usr/local/webapps/windmill. After creating this directory structure, you should restart both Tomcat and Apache, so that they both recognize and can access the new application.

After setting up the new application, I wanted to compile and install the example servlet from Listing 1. I typed the code into a file called myDF.java, and compiled with the command:

% javac myDF.java

When compiled successfully, this creates a file called myDF.class. I copy this file to /usr/local/webapps/windmill/WEB-INF/classes, and I can call this servlet with the URL:

http://rocinante/windmill/servlet/myDF

I also typed the code for the JSP in Listing 2 into the file /usr/local/webapps/windmill/myDF.jsp, which I can then access with the URL:

http://rocinante/windmill/myDF.jsp

That's a very simple example of setting up a new Web application with Tomcat. There are many other configuration options, especially with the application's configuration file, WEB-INF/web.xml. Among the options are the ability to set initialization parameters for servlets, configure security, set MIME types, and more. You can also add index.jsp to the DirectoryIndex directive in your Apache configuration. This directive may be found in /etc/httpd/conf/srm.conf, depending upon your individual Apache setup. After doing this, my DirectoryIndex configuration looks like:

DirectoryIndex index.html index.shtml index.cgi index.jsp

Restart Apache after making this change, and you can now use a JSP called index.jsp as your default page in directories.

Deploying an application server in your environment, whether logically or on a physically dedicated server, offers challenges beyond whether it is Open Source or commercial software. You must be sure that you deploy and configure the software in accordance with all the policies and procedures you already have in place. This is true even if you haven't formalized those policies and procedures, and they exist simply as best practices and your own expertise in your field.

As with any application that runs some form of network service, one important step is to make sure that no unnecessary services are used. By default, Tomcat runs both the servlet container connector (listening on TCP port 8007) and an HTTP server (on port 8080). If you install the server adapter for Apache, you don't need Tomcat's HTTP server. Turn it off by removing or commenting out the HTTP connection handler from $TOMCAT_HOME/conf/server.xml. It should look something like this:

<!-- Normal HTTP -->
<Connector className="org.apache.tomcat.service.PoolTcpConnector">
<Parameter name="handler"

value="org.apache.tomcat.service.http.HttpConnectionHandler"/>
       <Parameter name="port"
           value="8080"/>
</Connector>

This is an XML file, so it uses the same comment delimiters as HTML. Simply move the trailing comment delimiter (-->) from the end of the first line above, to a line after the ending </Connector> tag.

Another critical issue with any server daemon is to make sure it runs with no more privileges than are necessary. Your Apache HTTP daemons probably run as an unprivileged user, like "nobody". This can help prevent users from accessing sensitive files if they should happen to exploit some bug in the server (such as a buffer overflow bug).

If you ran Tomcat's startup script as root, both the Tomcat servlet connector and the HTTP daemon are running with root privileges. Further, if you leave that HTTP service running and never use it, you now have an unused service running as root. This could be exploited, and you may not notice for a long time.

Since you'll probably want to start the Tomcat application server at boot time, rather than have to start it manually every time, I have included a simple script you can add to your run control environment, typically in the /etc/rc.d/init.d directory. (See Listing 4.) This script will start Tomcat as an unprivileged user (e.g., user name "nobody"). By putting this script in /etc/rc.d/init.d and creating a symbolic link such as "S97tomcat" to it in /etc/init.d/rc5.d, you can start Tomcat services at boot time, without their assuming root privileges. This configuration is for Red Hat 6.1. You should adjust as necessary to suit your flavor of UNIX.

Because Tomcat needs to know some things about its environment to run properly, you should also add the following environment variable settings to the startup.sh and shutdown.sh files called by the init script in Listing 4.

JAVA_HOME=/usr/java/jdk1.3
TOMCAT_HOME=/usr/local/jakarta-tomcat
CLASSPATH=.:$JAVA_HOME/lib/tools.jar
export JAVA_HOME TOMCAT_HOME CLASSPATH

Change the values of JAVA_HOME and TOMCAT_HOME to reflect your installation and place these lines in startup.sh and shutdown.sh before the lines that read:

BASEDIR=`dirname $0`
$BASEDIR/tomcat.sh start "$@"

Another task of the systems administrator is monitoring and managing log files. It should come as no surprise that this raises security implications as well. Monitoring log files allows you to separate suspicious activity from normal activity and to distinguish system or application failure-related activity. Log file rotation is also critical. A rogue log file that fills up a file system can cripple your mission-critical applications. Tomcat's log files are located in $TOMCAT_HOME/logs. Make them a part of your routine, automated or otherwise, of monitoring and rotation/archival of your systems logs.

Conclusion

Application servers come in many flavors, and regardless of how you look at it, the options for Linux environments are growing rapidly. Most of this growth appears to be in the so-called Enterprise Application Server market, where Java seems to be well positioned to dominate this market. This article only scratches the surface of application server technology, as it exposes just the reference implementation of a Java servlet and JSP container.

Chris Bush is a Senior Consultant with marchFIRST, Inc., specializing in middle-tier development on an endless variety of e-commerce projects.