Data Replication in High Availability Clusters
Nelson Yount
The ultimate goal of high availability (HA) clustering is to improve
the level of availability of applications and other services to
their end-users. HA clusters generally consist of two or more computer
systems in close physical proximity. The HA clustering software
is designed to monitor the status of all of the systems in the cluster,
and upon the failure of any of those systems, to restart on one
or more of the remaining systems any applications that were running
on the failed system. The clustering software will typically also
move the network address associated with such an application to
the backup system, allowing clients to easily continue to access
the application.
Cluster Storage Models
For this application migration to work, it is necessary that the
application data be accessible both to the computer system on which
the application was running originally (the primary system) and
to the system to which the application is moved (the backup system).
This is usually accomplished with some form of shared storage, typically
either an external storage array (to which both systems are connected
by a common SCSI bus or a Fibre Channel network) or a network attached
storage (NAS) device. By placing the application data on a shared
external storage device, an application has access to its data regardless
of whether it is running on the primary or backup system.
Cluster Configurations
The simplest HA cluster configuration consists of two systems,
with one system actively running one or more applications, and the
second system acting purely as a backup for the first. This configuration
is known as a 2-node active/passive cluster. A more common and more
useful configuration is the 2-node active/active cluster (see Figure
1). Again, the cluster consists of two systems, but in this case
both systems are actively running applications and each is serving
as the backup for the other. If either system experiences a failure,
the other system in the cluster takes over the applications from
the failed system, while continuing to run its own applications.
There are a number of ways to extend a cluster beyond two systems.
One of the most common configurations is known as the N+1 cluster.
Such a cluster consists of N active systems, each running one or
more applications, and a single passive system that serves as the
backup for all of the N active systems. This configuration can be
thought of as N different 2-node active/passive clusters that all
have the same passive backup system. The storage for such a cluster
can be shared between all N+1 systems, or each of the N systems
can have its own external storage that can also be accessed by the
backup system.
All of the cluster configurations discussed thus far involve a
simple pairing of systems. Some HA clustering products cannot go
beyond this, and in many cases, may be sufficient to meet the HA
needs of a given computing environment. But many clustering products
today have the added flexibility of going beyond paired failovers.
Such products are capable of supporting clusters of more than two
systems in which all of the systems have access to the same external
storage device. They typically provide features that offer much
greater flexibility in the way an application can be migrated in
the event of a failure. Products that support multi-directional
failover allow the cluster to be configured such that when a given
system fails, some of the applications from the failed system can
be migrated to one system in the cluster, some to a second, some
to a third, and so on. This allows the additional load from the
failed system to be better distributed across the remaining systems
in the cluster.
More advanced clustering products may also support a feature known
as cascading failover. This means that a given application can migrate
across more than two systems, allowing it to survive multiple system
failures. After an initial system failure in which a given application
is migrated to a backup system, if the backup system then also fails,
the application can be migrated to yet another backup system. These
application failovers can cascade across all the systems in the
cluster if necessary, in an attempt to keep the application running
and available.
The Data Replication Alternative
All of these HA clustering options depend on the ability of the
application to access its data from each of the systems on which
the application needs to run. As noted previously, this is almost
always accomplished via some form of shared storage. However, there
is an additional option for providing application data access on
multiple systems, especially in small 2-node clusters. Data replication
technology can be used to maintain separate identical local copies
of the application data on the two systems.
What Is Data Replication?
Data replication is simply a method of copying data from a source
system to a target system via a network, with the intent of maintaining
an identical copy of some portion of the source system's data
on the target system. This mechanism usually involves the use of
an additional driver layer on the source system to intercept writes
and propagate the modified data to a corresponding driver or daemon
on the target system.
There are several types of data replication available. The most
common methods operate at the block, filesystem, or file level.
A block-level replication mechanism operates by intercepting writes
at the disk block level and replicating those changes to the target
system. This mechanism works transparently to the filesystem or
application that is using the underlying disk volume. A filesystem-level
replication scheme intercepts writes just above the filesystem layer
and is therefore specific to the underlying filesystem. File-level
replication replicates changes to a specific file or group of files
on the source system.
Besides these forms of replication, some database engines include
built-in replication capabilities for maintaining an identical copy
of a database on a second system. Some high-end external storage
systems have the ability to perform their own replication to a second
storage system, avoiding any involvement by the systems that are
using the storage.
Data replication also has three primary modes of operation: synchronous,
asynchronous, and periodic. In the synchronous mode, an application's
write request does not complete until the data has been successfully
written on both the source and target systems. In the asynchronous
mode, the application's write request results in having the
data written on the source system and queued for transmission to
the target system. The queued data update will be then transmitted
to the remote target system as system resources and network bandwidth
allow. The periodic mode of replication simply transmits data changes
on the source system at predetermined periodic intervals in a batch-like
manner. Data replication products may support any or all of these
modes of operation.
Data Replication and HA Clusters
To better understand the use of data replication in a HA cluster,
consider the case of a 2-node active/passive cluster running a single
application (see Figure 2). When the application is active on the
primary system, all updates to the application data are automatically
replicated to the backup system by the data replication mechanism.
When a failure occurs and the application is moved to the backup
system, it continues its operations using the mirrored data residing
on the backup system. When the primary system is returned to service,
the direction of the data replication is reversed, such that all
data updates on the backup system are replicated to the primary
system, after an initial resynchronization process to bring the
primary system up to date with any data changes that may have occurred
while it was unavailable.
Data replication can generally be used with each of the other
cluster configurations discussed previously, although each of those
configuration types requires certain capabilities that not all data
replication products possess. Data replication can be used in 2-node
active/active clusters, but the replication mechanism must be capable
of supporting simultaneous replication in both directions between
the two systems. The N+1 cluster configuration requires that the
data replication mechanism be capable of supporting multiple (N)
source systems replicating data to a single target system. Multi-directional
failover can be configured if the replication product can support
a single source system replicating multiple data volumes, each with
a potentially different target system. Cascading failover can be
supported only if the data replication mechanism has the ability
to replicate a single data volume on the source system to multiple
target systems.
Data Replication Drawbacks
The use of data replication in HA clusters is not without challenges.
A number of issues must be considered before deciding to use data
replication rather than some form of shared storage. Perhaps the
first issue to consider is performance. Because of the additional
overhead required to transmit data updates to a remote target system
for every write request by an application, the synchronous mode
of data replication can have severe performance impacts on write-intensive
applications. Both the asynchronous and periodic modes of replication
can alleviate much of this performance concern, but they both present
an increased risk of data loss. If the primary system in a cluster
fails at a point in time when there is application data that has
not yet been replicated to the backup system, that data will be
lost when the application is migrated to the backup system.
Data replication also presents a requirement for data resynchronization
that is not present with the shared storage form of HA clustering.
After a primary system has failed, a failover has been performed,
and the primary system has been restored to service, a data resynchronization
process must occur to bring the data on the primary system up to
date with the backup system. This process must include any data
changes that have occurred after the application began running on
the backup. Until this resynchronization process has been completed,
to avoid data loss, the application cannot be returned to the primary
system (either manually or automatically) in response to a failure
of the backup system. It is usually left to the HA clustering software
to enforce this restriction. During this time period, the application
has no HA protection at all.
The most straightforward method of performing data resynchronization
is to retransmit the entire volume of data to the out-of-date target
system. For large data volumes, this can be quite time consuming,
requiring hours or even days. Some mechanism is needed that will
allow resynchronization to be performed more quickly and efficiently.
This requirement may be met by either transaction logging or intent
logging. Transaction logging is where every individual data update
is recorded in a log that can be replayed in the case of a failure.
Intent logging uses a bitmap or other structure to record which
blocks of data have been modified, such that resynchronization can
be accomplished simply by replicating only those blocks that are
known to have changed.
HA clusters that are built using data replication are also more
susceptible to a clustering issue called the "split-brain"
problem. This problem arises when a failure occurs in the communication
infrastructure between the systems in the cluster. If two clustered
systems lose their ability to communicate with one another, they
are both led to believe that the other system has somehow failed
and will attempt to perform a failover operation. In a shared storage
cluster, the results can be disastrous, with the application running
on both systems and both performing writes to the shared data. This
will almost certainly result in serious data corruption.
Fortunately, shared storage also can be used as an advantage to
help avoid this problem. Various HA clustering products use different
means, but those available include SCSI reservations, communication
paths through the shared storage device itself, and disk-based quorum
mechanisms. However, in clusters based purely on data replication,
none of these techniques are available. Thus, it is crucial to heed
the advice of nearly all clustering software vendors to establish
multiple independent paths of communication between the systems
in your cluster.
When Is Data Replication Appropriate?
Despite the challenges described above, there are a number of
scenarios and factors that can justify the use of data replication
in a HA cluster. If the amount of shared application data is very
small, or if the shared application data is primarily read-only,
many of the described problems are minimized and data replication
may be quite acceptable.
Data replication may also be considered if cost is a major factor.
Shared storage devices can be quite expensive, especially in a large
deployment of HA clusters such as in a retail environment with numerous
identical sites. The administration of shared storage can also be
costly, requiring additional IT staff with the expertise necessary
to deal with the increased complexity of shared storage issues.
Data replication can be an effective means of avoiding these additional
costs when implementing a HA cluster.
If the requirements for a given application or environment already
call for multiple copies of the application data, perhaps for backup
or other purposes, data replication can accomplish both that goal
and the shared data access requirement for high availability.
Shared storage typically imposes certain distance limitations
between the systems and the external storage system. Data replication
may be the only available solution for clustering when the distance
between systems precludes the use of shared storage.
Disaster Recovery Solutions
Taking the distance issue above to the extreme, disaster recovery
solutions typically involve the movement of data across large distances.
Traditional disaster recovery solutions involve replicating the
most critical components of a company's data center at a remote
location. Systems and business-critical applications are available
in a standby mode at the remote site ready to come into service
if needed. Backups of company data are made each night and shipped
either to the remote site or to some secure off-site storage facility
where they can be retrieved if needed. Optionally, business-critical
data can be replicated in real-time to the remote site through the
use of data replication techniques like those discussed above.
Like disaster recovery solutions, HA clustering products aim to
provide continued access to applications and data in the face of
certain abnormal conditions. Traditional HA clustering is generally
focused on conditions such as the failures of individual systems,
applications, or application resources, not complete site failures.
But the concepts and mechanisms of HA clustering, if combined with
the right technology, can be an alternative solution to the disaster
recovery problem. By combining clustering technology for high availability
with data replication technology for disaster recovery, it is possible
to achieve an answer to both problems with a single well-integrated
solution.
Imagine a 2-node active/passive cluster consisting of systems
A and B, using shared storage. System A is the primary system for
a single application, and system B is configured as the backup.
The application data is stored on the shared storage array. But
suppose that the shared application data could also be replicated
to a remote system C, using data replication software running on
the primary system A. The result is a 3-node HA cluster (see Figure
3), consisting of two local systems with shared storage, and a third
remote system receiving data updates via data replication over a
wide area network (WAN).
In this example cluster, if system A fails and system B remains
healthy, the HA clustering software will migrate the protected application
to system B. This is normal behavior for a 2-node HA cluster. In
this case, however, the data replication will also be reconfigured
such that system B sends all application data updates to system
C.
The value in combining HA clustering with data replication to
a remote site becomes clear when you consider the question, "What
happens if system B fails before system A can be restored to service?"
Or even more significant, "What happens if both system A and
system B fail simultaneously due to a site disaster?"
Because system C is included in the cluster as defined by the
HA software, and because it has access to the application data,
the protected application can be migrated to system C with little
or no interruption in service. This automatically managed application
migration is significantly faster and easier than the process involved
in a traditional disaster recovery solution. The return to normal
service is faster and easier, because the HA software will automatically
handle the reversal of the data replication from system C back to
either system A or B when they are returned to service. It will
also allow the administrator to perform an easy migration of the
application back to one of those local systems when desired.
Disaster Recovery Technical Requirements
To support such a configuration, a number of technical requirements
must be met by the HA clustering software and data replication mechanisms
that make up the overall solution. These requirements are described
below.
Asynchronous Replication
If the synchronous mode of replication were used in a disaster
recovery solution, the additional network latency of the WAN usually
would make the impact on the write performance of the application
intolerable. So for these solutions, data replication should be
done asynchronously, where the data is written immediately on the
local system and queued for transmission to the remote backup system
as network bandwidth allows.
Fast Resynchronization
The time required to perform data resynchronization by copying
the complete volume of data to the target system would be completely
unacceptable in the WAN environment. So a fast resynchronization
mechanism, using either transaction logging or intent logging, is
a necessity in a disaster recovery solution. These techniques allow
data resynchronization to be accomplished by transmitting a much
smaller portion of the data over the network.
Network Address Migration Across the WAN
A significant component of the migration of an application from
one system to another in a HA cluster is the migration of the network
address associated with the application. This allows clients to
continue to connect to the application, many times in a completely
seamless manner. The migration of a network address is accomplished
rather easily in a LAN, in which both the primary and backup systems
are on the same subnetwork. But this is typically not the case in
a disaster recovery solution, in which the primary and backup systems
are on physically disjointed networks, making it impossible to move
a network address to a subnetwork in which it does not belong.
Recent advances in network technology, however, have made this
problem solvable. Virtual LAN (VLAN) technology makes it possible
to create a logical subnetwork that spans multiple physical networks.
With the right networking hardware, VLANs can be established across
wide area connections between routers, even routers connected via
Virtual Private Networks (VPNs). As long as the mechanism used by
the HA clustering software to force updates of network address to
hardware address mapping tables works properly in the VLAN environment,
this problem can be solved without additional software features.
Integration of Shared Storage Cluster with Data Replication
Not all HA clustering software products can implement the combination
of a local shared storage cluster and a remote system updated via
data replication. Such a configuration requires two capabilities
that not all HA clustering products include. Most fundamentally,
the product must be able to support clusters of more than two systems.
The product must also be able to support the combination of shared
storage and data replication for a single protected application.
Both of these features have significant implications upon the basic
architecture of the HA software.
Summary
Data replication is an attractive alternative for providing application
data access on multiple systems in HA clusters. It is particularly
useful in environments where its potential drawbacks are minimized
by the nature of the applications or data to be protected, or where
the cost savings over a shared storage solution outweigh other negatives.
When extended to the problem of disaster recovery, the combination
of HA clustering software and data replication techniques can be
used to build a solution that far surpasses traditional disaster
recovery solutions.
Nelson Yount is one of the original employees of SteelEye Technology,
Inc. (http://www.steeleye.com) and serves as the Chief Technology
Officer to provide vision and guidance to the company on product
technological advancements, as well as define the architectural
direction for SteelEye LifeKeeper business continuity solutions.
For more than 17 years, Nelson has been designing and implementing
clustering and systems management solutions for Windows, Linux,
and Unix environments.
|