Data-Sharing Architectures in the Linux Data Center
Matthew O'Keefe
Very few would dispute that Linux plays a crucial role at the
center of the Internet computing universe. But in the past few years,
Linux has been moving up the value chain, steadily gaining the respect
of corporations worldwide. Now, Linux is in the data center with
high-performance clusters that offer mainframe-like performance
and reliability. This article will examine the trends influencing
the adoption of Linux clusters in the data center and explore why
clustering has been "gaining ground." Such trends include:
- Increased demand for higher performance and more processing
power
- Undeniable need for high-availability in the data center
- Emergence of less expensive alternatives to supercomputers
(i.e., use of blade servers)
Many systems administrators are faced with the responsibility
of choosing, and ultimately implementing, a data-sharing architecture
on Linux, so this article will also serve as a guideline for those
considering deploying Linux in the data center. Many users often
choose a data-sharing architecture for their environments without
full awareness of other alternatives that may make their application
environments faster, less costly, and easier to manage. This article
will review the advantages of data-sharing Linux clusters, including:
- Ease and cost of administration
- Interoperability with other systems/applications
- Performance and scale
- Ease and speed of data sharing
- Suitability for incremental computing
In Unix, clusters have traditionally referred to groups of machines
that (1) share various resources including network hardware, (2)
are managed as a single system, and (3) provide mechanisms to transfer
applications from a failed machine to a machine that is still up
and running. The latter approach, termed "failover", is
quite popular as a mechanism to increase availability. It relies
on software watchdogs on one server to detect failure of another
server, typically through the use of timeouts. Application agents
provide information necessary for application restart on another
server. Multi-ported shared storage devices are used so that all
machines have access to the same files.
In traditional clusters using failover, physically shared storage
is not accessed simultaneously by two or more machines. A complicated
set of steps is necessary to transfer ownership of shared storage
devices from one machine to another. Standard file systems and volume
managers are not structured to allow simultaneous access to file
systems mapped onto physically shared volumes. However, this situation
is changing; cluster file systems and cluster volume managers have
been developed that allow simultaneous access to shared storage
devices. Examples of various cluster file systems that currently
offer this type of functionality include IBM's GPFS, Sistina
GFS, and Hewlett-Packard's Tru64 cluster file system.
Though the difference between traditional file systems and cluster
file systems is deceptively simple, the impact of data-sharing across
tightly coupled servers in a cluster is profound and provides far-reaching
benefits for IT infrastructures designed to exploit them. These
benefits include:
- Performance equal to local file systems for files shared across
the cluster
- More transparent application failover and the ability to load-balance
applications that share files across the cluster
- Incremental scalability instead of forklift upgrades --
servers and storage can be incrementally added to the cluster
to provide better performance and capacity
- Simplified management is obtained by removing the need for
a separate file system per server in the cluster
- Increased storage efficiency results from the ability to share
storage blocks and re-allocate them as necessary among a small
number of shared file systems in the cluster
The growth of storage infrastructure deployments also is supported
by findings from various industry sources in the financial sector.
In a September 2002 report by Ashok Kumar of Piper Jaffray, "State
of the Network Storage Industry," Kumar states that there are
certain types of storage software that will make a difference in
building reliable storage. These include integrated management software
(file systems, virtualization layers), integrated planning/management
software, data redundancy management, and file system software management.
Furthermore, cluster file system software is finding itself strategically
aligned with three converging technologies in the marketplace: Linux,
storage area networks, and high-density Intel servers. The computer
industry is currently in the midst of a transition from Unix server
technologies based on proprietary RISC hardware to Linux systems
based on commodity Intel PC server hardware. A January 2003 report
from Goldman Sachs further supports the growth of these infrastructure
deployments stating, "Linux's similarity to the Unix systems
currently in use in the data center, both in terms of its APIs and
management skills and tools needed to administer it, provides for
a relatively pain-free migration path from current Unix/RISC platforms
to Linux/Intel."
IT leaders are discovering that replacing expensive, high-processor-count
SMP (symmetric multiprocessor) architectures with data-sharing Linux
clusters provides dramatically improved price-to-performance ratios,
simplified management, and incremental scalability from small server
clusters with fewer than four machines to very large clusters with
hundreds of machines.
The fundamental approach behind data-sharing architectures is:
the ability to incrementally and independently add, when needed,
more processing capacity, I/O bandwidth, or storage without the
need to move or change the deployed applications or re-deploy the
original hardware (i.e., forklift upgrade) in favor of more powerful
components (see Figure 1 vs. Figure 2).
Applications
System designers are exploiting data-sharing Linux clusters for
many applications. For example, applications that consist of many
independent compute jobs reading files written by previous jobs
and writing new files for later jobs to process are good candidates
for data-sharing clusters. Today's use of these clusters has
production deployments in application segments such as seismic analysis,
bioinformatics/genomics workloads, finite element analysis, or hedge
fund computations. Clusters running these types of applications
provide large compute, I/O, and storage capacity that can be incrementally
scaled. Additionally, good performance and significantly simplified
management (which manifests itself in greatly simplified data flow
through the cluster) are achieved with this approach.
Another common application involves parallel databases using cluster-aware
databases such as Oracle 9i RAC. Data-sharing clusters significantly
simplify management for parallel databases by allowing database
logs and tables to be mapped to files instead of hard-to-manage
raw devices. Configuration files can also be shared instead of copied
across all the nodes in the cluster.
Parallel applications that run in a tightly coupled way across
the cluster can also benefit from the performance and scalability
found in data-sharing clusters. These clusters can run I/O in parallel
across all servers, achieving close to the maximum possible bandwidth
and I/O operations per second found in the raw storage hardware.
Other key applications that benefit from the deployment of Linux
clusters in the data center include enterprise level Web-based application
frameworks, such as BEA WebLogic or IBM WebSphere.
NFS is a common file-sharing protocol in the Unix environment,
but NFS servers often suffer from performance bottlenecks due to
the limitations on the number of network ports and storage capacity
in a single NFS server. Because NFS is a stateless protocol, it
can be run in parallel across multiple servers in a data-sharing
cluster to achieve scalable performance. Each NFS server can share
the same cluster file system as the local file system exported to
the NFS clients on the network. This approach also helps solve the
backup difficulties found in typical NFS servers by allowing one
of several machines to act as a backup server while other machines
in the cluster still provide application cycles.
Fundamental Building Blocks for Clustered Configurations
Certain cluster file system and volume manager features are important
to achieving the best results with data-sharing clusters. A cluster
file system should provide the following features:
- Scalability to 100s of nodes
- Ability to tolerate server, SAN, and storage device failures
to provide high-availability in the cluster
- POSIX-compliance to support all applications, whether cluster-enabled
or not
- Performance comparable to local file systems
- Large-scale field deployments over a long period of time to
provide proven resilience and reliability
- Ability to work with different storage area network technologies,
including Fibre Channel and iSCSI
A cluster volume manager should provide the following features:
- Scalability to 100s of nodes and 1000s of storage devices
- Support for dynamic multi-pathing to route around SAN, HBA,
or storage device failures
- Ability to modify logical volumes on the fly (resize, move,
copy) in the cluster from any server
- Support for snapshots, including large numbers of snapshots
without performance losses
- Support for software mirroring
- Integration with cluster file systems to allow consistent snapshots
of stable file system state on a shared volume
Summary
Data-sharing clusters provide dramatically improved price-to-performance
ratios, simplified systems administration and management, and incremental
scalability from small server clusters with fewer than four machines
to very large clusters with hundreds of machines. Data-sharing Linux
clusters are becoming increasingly common as the cost and simplified
management benefits of incremental computing combined with the economies-of-scale
of Intel servers, the Linux operating system, and commodity storage
networking technologies provide order-of-magnitude reductions in
cost and complexity.
Matthew O'Keefe is CTO and founder of Sistina Software.
He is a senior member of IEEE as well as a member of its Technical
Committee on Mass Storage Systems, and has served on the National
Academy of Science panel -- making recommendations on computational
issues and simulations. Matt received his M.S. and Ph.D. degrees
in Electrical Engineering from Purdue University in 1986 and 1990,
respectively.
|