aug2003.tar

Data-Sharing Architectures in the Linux Data Center

Matthew O'Keefe

Very few would dispute that Linux plays a crucial role at the center of the Internet computing universe. But in the past few years, Linux has been moving up the value chain, steadily gaining the respect of corporations worldwide. Now, Linux is in the data center with high-performance clusters that offer mainframe-like performance and reliability. This article will examine the trends influencing the adoption of Linux clusters in the data center and explore why clustering has been "gaining ground." Such trends include:

Increased demand for higher performance and more processing power
Undeniable need for high-availability in the data center
Emergence of less expensive alternatives to supercomputers (i.e., use of blade servers)

Many systems administrators are faced with the responsibility of choosing, and ultimately implementing, a data-sharing architecture on Linux, so this article will also serve as a guideline for those considering deploying Linux in the data center. Many users often choose a data-sharing architecture for their environments without full awareness of other alternatives that may make their application environments faster, less costly, and easier to manage. This article will review the advantages of data-sharing Linux clusters, including:

Ease and cost of administration
Interoperability with other systems/applications
Performance and scale
Ease and speed of data sharing
Suitability for incremental computing

In Unix, clusters have traditionally referred to groups of machines that (1) share various resources including network hardware, (2) are managed as a single system, and (3) provide mechanisms to transfer applications from a failed machine to a machine that is still up and running. The latter approach, termed "failover", is quite popular as a mechanism to increase availability. It relies on software watchdogs on one server to detect failure of another server, typically through the use of timeouts. Application agents provide information necessary for application restart on another server. Multi-ported shared storage devices are used so that all machines have access to the same files.

In traditional clusters using failover, physically shared storage is not accessed simultaneously by two or more machines. A complicated set of steps is necessary to transfer ownership of shared storage devices from one machine to another. Standard file systems and volume managers are not structured to allow simultaneous access to file systems mapped onto physically shared volumes. However, this situation is changing; cluster file systems and cluster volume managers have been developed that allow simultaneous access to shared storage devices. Examples of various cluster file systems that currently offer this type of functionality include IBM's GPFS, Sistina GFS, and Hewlett-Packard's Tru64 cluster file system.

Though the difference between traditional file systems and cluster file systems is deceptively simple, the impact of data-sharing across tightly coupled servers in a cluster is profound and provides far-reaching benefits for IT infrastructures designed to exploit them. These benefits include:

Performance equal to local file systems for files shared across the cluster
More transparent application failover and the ability to load-balance applications that share files across the cluster
Incremental scalability instead of forklift upgrades -- servers and storage can be incrementally added to the cluster to provide better performance and capacity
Simplified management is obtained by removing the need for a separate file system per server in the cluster
Increased storage efficiency results from the ability to share storage blocks and re-allocate them as necessary among a small number of shared file systems in the cluster

The growth of storage infrastructure deployments also is supported by findings from various industry sources in the financial sector. In a September 2002 report by Ashok Kumar of Piper Jaffray, "State of the Network Storage Industry," Kumar states that there are certain types of storage software that will make a difference in building reliable storage. These include integrated management software (file systems, virtualization layers), integrated planning/management software, data redundancy management, and file system software management.

Furthermore, cluster file system software is finding itself strategically aligned with three converging technologies in the marketplace: Linux, storage area networks, and high-density Intel servers. The computer industry is currently in the midst of a transition from Unix server technologies based on proprietary RISC hardware to Linux systems based on commodity Intel PC server hardware. A January 2003 report from Goldman Sachs further supports the growth of these infrastructure deployments stating, "Linux's similarity to the Unix systems currently in use in the data center, both in terms of its APIs and management skills and tools needed to administer it, provides for a relatively pain-free migration path from current Unix/RISC platforms to Linux/Intel."

IT leaders are discovering that replacing expensive, high-processor-count SMP (symmetric multiprocessor) architectures with data-sharing Linux clusters provides dramatically improved price-to-performance ratios, simplified management, and incremental scalability from small server clusters with fewer than four machines to very large clusters with hundreds of machines.

The fundamental approach behind data-sharing architectures is: the ability to incrementally and independently add, when needed, more processing capacity, I/O bandwidth, or storage without the need to move or change the deployed applications or re-deploy the original hardware (i.e., forklift upgrade) in favor of more powerful components (see Figure 1 vs. Figure 2).

Applications

System designers are exploiting data-sharing Linux clusters for many applications. For example, applications that consist of many independent compute jobs reading files written by previous jobs and writing new files for later jobs to process are good candidates for data-sharing clusters. Today's use of these clusters has production deployments in application segments such as seismic analysis, bioinformatics/genomics workloads, finite element analysis, or hedge fund computations. Clusters running these types of applications provide large compute, I/O, and storage capacity that can be incrementally scaled. Additionally, good performance and significantly simplified management (which manifests itself in greatly simplified data flow through the cluster) are achieved with this approach.

Another common application involves parallel databases using cluster-aware databases such as Oracle 9i RAC. Data-sharing clusters significantly simplify management for parallel databases by allowing database logs and tables to be mapped to files instead of hard-to-manage raw devices. Configuration files can also be shared instead of copied across all the nodes in the cluster.

Parallel applications that run in a tightly coupled way across the cluster can also benefit from the performance and scalability found in data-sharing clusters. These clusters can run I/O in parallel across all servers, achieving close to the maximum possible bandwidth and I/O operations per second found in the raw storage hardware. Other key applications that benefit from the deployment of Linux clusters in the data center include enterprise level Web-based application frameworks, such as BEA WebLogic or IBM WebSphere.

NFS is a common file-sharing protocol in the Unix environment, but NFS servers often suffer from performance bottlenecks due to the limitations on the number of network ports and storage capacity in a single NFS server. Because NFS is a stateless protocol, it can be run in parallel across multiple servers in a data-sharing cluster to achieve scalable performance. Each NFS server can share the same cluster file system as the local file system exported to the NFS clients on the network. This approach also helps solve the backup difficulties found in typical NFS servers by allowing one of several machines to act as a backup server while other machines in the cluster still provide application cycles.

Fundamental Building Blocks for Clustered Configurations

Certain cluster file system and volume manager features are important to achieving the best results with data-sharing clusters. A cluster file system should provide the following features:

Scalability to 100s of nodes
Ability to tolerate server, SAN, and storage device failures to provide high-availability in the cluster
POSIX-compliance to support all applications, whether cluster-enabled or not
Performance comparable to local file systems
Large-scale field deployments over a long period of time to provide proven resilience and reliability
Ability to work with different storage area network technologies, including Fibre Channel and iSCSI

A cluster volume manager should provide the following features:

Scalability to 100s of nodes and 1000s of storage devices
Support for dynamic multi-pathing to route around SAN, HBA, or storage device failures
Ability to modify logical volumes on the fly (resize, move, copy) in the cluster from any server
Support for snapshots, including large numbers of snapshots without performance losses
Support for software mirroring
Integration with cluster file systems to allow consistent snapshots of stable file system state on a shared volume

Summary

Data-sharing clusters provide dramatically improved price-to-performance ratios, simplified systems administration and management, and incremental scalability from small server clusters with fewer than four machines to very large clusters with hundreds of machines. Data-sharing Linux clusters are becoming increasingly common as the cost and simplified management benefits of incremental computing combined with the economies-of-scale of Intel servers, the Linux operating system, and commodity storage networking technologies provide order-of-magnitude reductions in cost and complexity.

Matthew O'Keefe is CTO and founder of Sistina Software. He is a senior member of IEEE as well as a member of its Technical Committee on Mass Storage Systems, and has served on the National Academy of Science panel -- making recommendations on computational issues and simulations. Matt received his M.S. and Ph.D. degrees in Electrical Engineering from Purdue University in 1986 and 1990, respectively.