Real-Time
Remote Data Mirroring
Primitivo Cervantes
When planning for a disaster, there are many issues to consider
and resolve. Some of these issues are resolved with the use of a
remote location as an emergency production site in case the primary
site is unavailable due to a disaster. In this situation, a remote
data center is set up with systems similar to the primary production
systems. This remote disaster recovery site is usually connected
to the primary site via a network.
Even with a remote disaster recovery site, there are still many
issues to consider. The remote site is usually not in use during
normal production hours and is a cost center until it is used as
a recovery center. You must determine, for example, whether you
want the exact system as the primary site or whether you can do
with reduced performance during an emergency. You must decide whether
you can afford to lose some data in an emergency or whether you
need all of the most current data when bringing the systems online
at the remote site. You must also consider how current the data
must be and how much you are willing to pay for it. In this article,
I will discuss various data-replication techniques and their associated
advantages and disadvantages. I will also describe costs associated
with these techniques and describe cases where it might be justifiable
for them to be implemented. I will also describe an installation
of one of these techniques, using IBM's HAGEO product to replicate
data in real-time to a remote site.
Before I get into the data-mirroring techniques, I will explain
the types of data that clients usually mirror. There are three basic
types of data that are usually required to run a customer application:
operating system data, database data, and non-database application
data. Operating system data consists of the programs and files needed
to run the operating system such as IBM's AIX. Database data
is the actual data contained in a client's application typically
using a database manager, such as Oracle, Sybase, or Informix. Non-database
data is the client application's executables, configurations
files, etc. needed to run the application (such as Oracle Financials,
SAP, or PeopleSoft).
A system backup is typically used to backup and replicate the
operating system data, so this will not be covered in this article.
I will discuss only techniques to mirror database data and non-database
application data.
Remote Data Mirroring Techniques
System Tools
Historically, data has been mirrored to a remote location using
operating system tools such as ftp, rcp, uucp,
etc. These tools have distinct advantages in that they are available
with the operating system, easy to use, and very reliable.
Typically, a system backup is used to replicate a system at a
remote location. An initial database backup is created and restored
at the remote location. From this point on, the only things that
need to be sent to the remote location are the database archive
logs and any changed application executable or configuration file.
By sending only the database logs to the remote location and then
using the database to "play the logs" forward, the database
at the remote location can be kept consistent but will always be
behind the primary site. In other words, we send changes to the
database to the remote location, but the most current changes are
not sent.
The main disadvantage of these tools is that the remote site is
not always as current as the primary site (behind in database transactions).
This is usually reason enough to prevent many clients such as banks,
and hospitals from using these tools. These clients demand a real-time
mirroring solution.
Built-In Database Tools
Database vendors have been adding and enhancing remote mirroring
capabilities to their products, and this is another way of mirroring
data to a remote location. Using the database vendor products, the
database data can be mirrored and kept consistent across a geography.
The data at the remote location is just as current as the primary
site.
The main advantages of database vendor tools are that they are
available with the database at a low cost and are guaranteed to
work by the vendor. If there is a problem, the vendor has a support
structure that can help the client determine the cause and also
provide a fix, if necessary.
The main disadvantages of database vendor tools are that they
replicate only the database data and not application data. They
are also sometimes difficult to use and may require the source code
of the application be modified and recompiled.
Hardware-Based Tools
While database vendor tools are specific to the database vendor,
hardware-based solutions aim at providing data mirroring regardless
of the application or database vendor. One way to use hardware to
mirror data is with a storage system from a storage vendor, such
as EMC or IBM. EMC has long had remote mirroring capabilities, and
IBM recently announced (Dec. 2000) mirroring capabilities in some
of their storage products. These storage products allow you to mirror
data from one storage system to another storage system at a remote
site regardless of the data being mirrored. This mirroring is done
in real-time and works very well. Because these hardware storage
systems can use large amounts of cache, system performance can be
very good.
In cases where there are long distances (greater than 60 miles)
between sites, EMC uses three storage systems in series to mirror
data. A primary storage system is mirrored in real-time to a secondary
system at a nearby remote location (usually less than 10 miles).
This secondary system then mirrors the data to the third storage
system at best speed. This means the third storage system is not
as current as the first storage system; but no data is lost, because
it will eventually be fully synchronized from the second storage
system.
IBM's mirroring solution is relatively new, so I have not
had a chance to implement it at long distances. IBM says the mirroring
capabilities will work at longer distances than EMC (about 100 miles).
Another consideration with the hardware solution is that performance
degrades significantly as the distance between sites increases.
I have seen increases of 4000% in write times with as little increase
as 10 miles between sites. This is probably worst case, but nonetheless
is significant and should be considered.
The main disadvantage of these hardware solutions is the cost.
A single EMC or IBM storage system can easily cost $750K and the
long-distance mirroring solutions will certainly be in the millions.
Even so, for clients that need this additional safety net, I think
the hardware solution will eventually provide the best combination
of performance and data integrity. A hardware solution will probably
be the most expensive but provide the most functionality.
IBM's GEOrm or HAGEO
If the client application happens to be running on an IBM AIX
system, there are a couple of software products from IBM, called
HAGEO and GEOrm, that could be used for real-time data mirroring.
IBM's HAGEO and GEOrm actually mirror logical volumes from
one site to a remote site. They provide device drivers that intercept
disk writes to the AIX logical volumes and mirror this data in real-time
to the remote location.
The difference between the HAGEO product and the GEOrm product
is that the HAGEO product also integrates with IBM's high-availability
product HACMP. The HACMP product detects system failures and allows
a secondary system to take over an application should the primary
system fail. With the HACMP/HAGEO product combination, if your primary
system fails, the secondary system will automatically detect this
failure and take over the application, usually within half an hour
or so.
IBM typically charges anywhere from $100K to $400K in software
and services to implement a two-system HACMP/HAGEO cluster. Implementing
GEOrm does not involve HACMP, so it usually costs less (anywhere
from $40K to $150K). These are all ballpark figures, and you should
contact an IBM representative for an actual quote.
The main advantage of the HAGEO and GEOrm solutions is that they
are software based and work regardless of the database vendor or
application. The main disadvantages are that it usually requires
a higher skilled systems administrator to maintain and that it has
some system performance implications. Since the data mirroring is
done in real-time and without a large cache, this can have a high
impact on system performance. Also, this solution is only available
on IBM AIX systems.
An HAGEO Implementation
I have implemented and managed several HACMP/HAGEO projects, so
I will relate my experience with this product at a utility company
in California. In this example, the company had two systems, the
primary one located in Los Angeles County and the secondary one
in Orange County. They were using the IBM SP2 systems (the same
type that defeated Kasparov at chess). The systems chosen used multiple
processors and were several times faster than the single processor
systems they were replacing. Everyone was initially optimistic that
performance was not going to be an issue.
The disk drives were IBM SSA drives (only 14 4-GB disks in all).
The data was locally mirrored, so the usable disks were 7 4-GB disks,
a relatively small application. Note that we configured HAGEO in
its most secure mode of mwc (called mirror-write-consistency). This
required us to create logical volumes (called state maps) to use
with HAGEO, so we had to allocate two of these drives exclusively
for HAGEO use.
The systems were connected to the network with two 10-Mb Ethernet
adapters for the data mirroring (between the sites) and one 10-Mb
Ethernet adapter for the client traffic (where users were going
to be logging in).
For ease of debugging problems, we chose the following schedule:
1. Setup systems (local and remote sites) and get operating system
working.
2. Install application and Sybase database on one site.
3. Verify that application works correctly.
4. Install HACMP/HAGEO.
5. Initiate HACMP/HAGEO mirroring capabilities.
6. Verify that application still works correctly and that the data
is mirrored to remote location.
Initial Application Installation (Before HACMP/HAGEO)
During the initial application installation (Steps 1-3 above),
the application AIX logical volumes and filesystems were created
and the application software installed on the primary system. This
was done without HACMP/HAGEO, so the application installation processes
were typical of any IBM AIX system with that application. The application
was a custom labor management system using a Sybase database.
Up until this point, the overall cluster looked like Figure 1.
In this diagram, the two systems are installed and have all network
connections, but the application is only installed and running on
one system. The application and data will eventually be mirrored
to the remote system, so it is not necessary to install it on both
systems.
Note that the AIX logical volumes (LVs) have been defined to contain
the application and data. Sybase knows about them and reads and
writes to the LVs directly. These LVs are defined as:
/dev/appslv00 -- Where the Sybase DBMS and application
files are located
/dev/sybaselv01 -- Sybase database data
/dev/sybaselv02 -- Sybase database data
/dev/sybaselv03 -- Sybase database data
/dev/sybaselv04 -- Sybase database data
There are two Ethernet adapters for the user or client network.
One of these adapters is the HACMP "service" adapter to
which the clients connect. The other adapter is the HACMP "standby"
adapter used by HACMP as a backup adapter if the primary adapter
fails.
There are two Ethernet adapters for the HAGEO mirroring function.
HAGEO will load balance between the two adapters, so there is no
need for a standby adapter in this network.
In the following discussion, I will give the actual commands that
we used to configure HACMP and lastly, how we configured and started
the HAGEO mirroring.
Installing HACMP and HAGEO
Installing HACMP and HAGEO (Step 4 above) is a simple process
of loading the CDs on the system and running SMIT to install the
base HACMP and HAGEO packages. To do this on the command line, run
the following commands:
To install HACMP (all on one line):
/usr/lib/instl/sm_inst installp_cmd -a -Q -d '/dev/cd0' -f 'cluster.base
ALL @@cluster.base _all_filesets' '-c' '-N' '-g' '-X' '-G'
To install HAGEO (all on one line):
/usr/lib/instl/sm_inst installp_cmd -a -Q -d '.' -f 'hageo.man.en_US
ALL @@hageo.man.en_US _all_filesets,hageo.manage ALL @@hageo.manage
_all_filesets,hageo.message ALL @@hageo.message _all_filesets,hageo.mirror ALL
@@hageo.mirror _all_filesets' '-c' '-N' '-g' '-X' '-G'
Configuring HACMP
Configuring HACMP and HAGEO (Step 5 above) requires that you enter
all of the cluster information to HACMP and HAGEO. Most of this
information is shown in Figures 1 and 2. In this particular case,
the information needed to configure HACMP and HAGEO is:
- Create the HACMP cluster.
- Add the node names and IP addresses.
- Create the HACMP resource group.
- HAGEO geo-mirror device information.
- Activate the mirroring capabilities.
To configure the HACMP cluster:
/usr/sbin/cluster/utilities/claddclstr -i'1' -n'hageo1'
To configure the HACMP nodes and IP addresses:
/usr/sbin/cluster/utilities/clnodename -a 'labor1'
/usr/sbin/cluster/utilities/clnodename -a 'labor2'
When configuring the IP addresses, note that there are two addresses
associated with the primary adapter on each system (en0). One of these
addresses is the "service" address or the address to which
the clients connect. The second address is a "boot" address
used only when the system boots up. This is because the "service"
address will move from the primary system (labor1) to the standby
system (labor2) during a primary system failure. To bring up "labor1"
when this happens, we must have an address for the system, so it does
not conflict with the "service" address that just moved
over to "labor2". This is the "boot" address.
Figures 1 and 2 do not show the interface names that resolve to
the IP addresses. These are as follows:
10.25.172.32 labor1svc
10.25.172.31 labor1stby
10.25.182.32 labor1boot
10.25.172.42 labor2svc
10.25.172.41 labor2stby
10.25.182.42 labor2boot
To configure these IP addresses in HACMP:
/usr/sbin/cluster/utilities/claddnode -a'labor1svc' :'ether' :'ether1' \
:'public' :'service' :'10.25.172.32' :'0004ac5711aa' -n'labor1'
/usr/sbin/cluster/utilities/claddnode -a'labor1stby' :'ether' :'ether1' \
:'public' :'standby' :'10.25.172.31' : -n'labor1'
/usr/sbin/cluster/utilities/claddnode -a'labor1boot' :'ether' :'ether1' \
:'public' :'boot' :'10.25.182.32' : -n'labor1'
/usr/sbin/cluster/utilities/claddnode -a'labor1svc' :'ether' :'ether1' \
:'public' :'service' :'10.25.172.42' :'0004ac5711bb' -n'labor1'
/usr/sbin/cluster/utilities/claddnode -a'labor1stby' :'ether' :'ether1' \
:'public' :'standby' :'10.25.172.41' : -n'labor1'
/usr/sbin/cluster/utilities/claddnode -a'labor1boot' :'ether' :'ether1' \
:'public' :'boot' :'10.25.182.42' : -n'labor1'
Creating the HACMP Resource Groups
The application resource information comprises all the system
resources that HACMP uses to manage an application. In other words,
this is everything needed to start or stop the application. In our
case, we will need the AIX logical volumes (and AIX volume group)
associated with the application, the IP address the users log into,
and a script to start and stop the application.
We created three resource groups, one for "labor1",
one for "labor2", and one for moving the application from
site "losangeles" to site "orange".
Here are the commands we used to configure HACMP. To create the
HACMP "resource groups":
/usr/sbin/cluster/utilities/claddgrp -g 'resourcegroup1' -r 'cascading' \
-n 'losangeles orange'
/usr/sbin/cluster/utilities/claddgrp -g 'resourcegroup2' -r 'cascading' \
-n 'labor1'
/usr/sbin/cluster/utilities/claddgrp -g 'resourcegroup3' -r 'cascading' \
-n 'labor2'
To create the HACMP "application server" that contains the
start and stop script information:
/usr/sbin/cluster/utilities/claddserv -s'appserver1' \
-b'/apps/hacmp/start_script' -e'/apps/hacmp/stop_script'
To put all of the system resources in the HACMP resource group:
/usr/sbin/cluster/utilities/claddres -g'resourcegroup1' SERVICE_LABEL= FILESYSTEM=
FSCHECK_TOOL='fsck' RECOVERY_METHOD='sequential' EXPORT_FILESYSTEM= MOUNT_FILESYSTEM=
VOLUME_GROUP='appsvg' CONCURRENT_VOLUME_GROUP= DISK= AIX_CONNECTIONS_SERVICES=
AIX_FAST_CONNECT_SERVICES= APPLICATIONS='ktazp269ORACLE' SNA_CONNECTIONS= MISC_DATA=
INACTIVE_TAKEOVER='false' DISK_FENCING='false' SSA_DISK_FENCING='false' FS_BEFORE_IPADDR='false'
/usr/sbin/cluster/utilities/claddres -g'resourcegroup2' SERVICE_LABEL='labor1svc' FILESYSTEM=
FSCHECK_TOOL='fsck' RECOVERY_METHOD='sequential' EXPORT_FILESYSTEM= MOUNT_FILESYSTEM=
VOLUME_GROUP= CONCURRENT_VOLUME_GROUP= DISK= AIX_CONNECTIONS_SERVICES=
AIX_FAST_CONNECT_SERVICES= APPLICATIONS= SNA_CONNECTIONS= MISC_DATA= INACTIVE_TAKEOVER='false'
DISK_FENCING='false' SSA_DISK_FENCING='false' FS_BEFORE_IPADDR='false'
/usr/sbin/cluster/utilities/claddres -g'resourcegroup3' SERVICE_LABEL= FILESYSTEM=
FSCHECK_TOOL='fsck' RECOVERY_METHOD='sequential' EXPORT_FILESYSTEM= MOUNT_FILESYSTEM=
VOLUME_GROUP='appsvg' CONCURRENT_VOLUME_GROUP= DISK= AIX_CONNECTIONS_SERVICES=
AIX_FAST_CONNECT_SERVICES= APPLICATIONS= SNA_CONNECTIONS= MISC_DATA= INACTIVE_TAKEOVER='false'
DISK_FENCING='false' SSA_DISK_FENCING='false' FS_BEFORE_IPADDR='false'
After configuring HACMP, it's time to configure HAGEO.
Configuring HAGEO
The HAGEO configuration involves some basic steps:
- Import the HACMP configuration.
- Start GEOmessage.
- Configure the actual mirroring device drivers.
- Start the mirroring process.
To import the HACMP configuration:
/usr/sbin/krpc/krpc_migrate_hacmp
To start GEOmessage:
/usr/sbin/krpc/cfgkrpc -ci
Configuring HAGEO Mirroring Device Drivers
I mentioned previously that Sybase knows about the AIX logical
volumes and reads and writes directly to them. To configure the
HAGEO device drivers and have the application work without reconfiguration,
I will rename the AIX logical volumes to something else and create
the HAGEO device drivers with the names of the AIX logical volumes.
Look at Figure 2 and see that the AIX logical volumes were renamed
using the following names:
- /dev/appslv00 rename to /dev/appslv00_lv
- /dev/sybaselv01 rename to /dev/sybaselv01_lv
- /dev/sybaselv02 rename to /dev/sybaselv02_lv
- /dev/sybaselv03 rename to /dev/sybaselv03_lv
- /dev/sybaselv04 rename to /dev/sybaselv04_lv
We then created the GEOmirror devices (GMD's) with the previous
AIX LV names (the ones Sybase is configured for):
/dev/appslv00
/dev/sybaselv01
/dev/sybaselv02
/dev/sybaselv03
/dev/sybaselv04
Here's the command to create a GMD "/dev/appslv"
on "labor1":
mkdev -c geo_mirror -s gmd -t lgmd -l'appslv00' '-S' -a minor_num='1'
-a state_map_dev='/dev/appslv_sm' -a local_device='/dev/rappslv00_lv' -a
device_mode='mwc' -a device_role='none' -a remote_device='labor2@/dev/rappslv00_lv'
where "appslv" is the GMD name, "/dev/appslv00_sm"
is an AIX logical volume used as a log (called a "state map"),
"/dev/rappslv00_lv" is the local AIX logical volume
being mirrored and labor@/dev/rappslv00_lv is the remote AIX
logical volume mirror target.
The other GMDs were created pointing to the appropriate logical
volumes:
- /dev/sybaselv01 (mirrored the local and remote /dev/sybaselv01_lv)
- /dev/sybaselv02 (mirrored the local and remote /dev/sybaselv02_lv)
- /dev/sybaselv03 (mirrored the local and remote /dev/sybaselv03_lv)
- /dev/sybaselv04 (mirrored the local and remote /dev/sybaselv04_lv)
Almost Ready to Start HAGEO Mirroring
After configuring HACMP and the HAGEO devices, we are almost ready
to start mirroring across the WAN. Before doing this, however, we
need to tell HAGEO what the good copy is using the /usr/sbin/gmddirty
and /usr/sbin/gmdclean commands. By marking one side as "dirty"
and the other side as "clean", we are telling HAGEO that
all of the data needs to be copied from one site to the other. HAGEO
moves data from the "dirty" site to the "clean"
site. To mark a site as "dirty" or the site to copy from
(in this case "labor1"):
/usr/sbin/gmd/gmddirty -l appslv (and do this for all of the other GMD's)
To mark a site as "dirty" or the site to copy to (in this
case "labor2"):
/usr/sbin/gmd/gmdclean -l appslv (and do this for all of the other GMD's)
Starting the HAGEO Mirroring
Now that we have configured HACMP/HAGEO, all we have to do is
start HACMP. HACMP will then start the HAGEO GMDs and commence the
mirroring.
Here's the command to start HACMP:
/usr/sbin/cluster/etc/rc.cluster -boot '-N' '-b' '-i'
Final HAGEO Configuration
The outcome of all this is shown in Figure 2. If you look closely
at the Sybase DBMS, it is writing to what it thinks are the AIX
logical volumes. Sybase is actually sending the data to the HAGEO
GMDs. The HAGEO GMDs are sending the data to the AIX logical volumes
and to the remote site. This is remote mirroring using HAGEO.
Performance Issues
As far as the configuration and mirroring functionality was concerned,
everything went smoothly. We ran into severe problems with performance.
Again, the systems that we installed were several times faster (using
multiple processors) than the previous system, so we did not expect
performance problems in terms of the system. We did not know, however,
the application-write characteristics, so we expected some performance
issues but not severe ones.
We were surprised that many aspects of the application were single-threaded
and did not utilize the multiple processors of the system. A good
deal of work had to be done by the customer and application vendor
to streamline the application, and in some cases change it, so it
could use multiple processors. This work was unexpected and caused
several months of delay in the implementation of the solution.
Another performance issue was the use of the state maps by the
HAGEO product. The state maps are AIX logical volumes that are used
to log changes to the application logical volumes. This allows HAGEO
to keep track of changes to the logical volumes and synchronize
only the changes. We had initially thought that scattering the state
maps across drives would give us good performance. After some testing,
it was discovered that better performance was achieved by placing
those state maps together in a single drive (and without any application
logical volumes). This change alone gave us a 20% improvement in
the performance tests that we were using (which were specific to
this application).
After the performance testing and changes, we were able to mirror
data across the sites with performance levels satisfactory to the
customer. The performance was actually only slightly better then
their old systems but with more data integrity. Also, with the integration
of HAGEO with IBM's HACMP product, when the primary system
failed, the secondary system could automatically detect the error
and be up and running with the customer application within 20 minutes.
This cluster has been in production for a couple of years now,
and has worked well. Again, it has required good systems administration
skills to maintain, but in the several cases where the primary system
has failed (for hardware reasons mostly), the secondary system successfully
took over the workload within the expected amount of time. This
particular configuration cost the customer approximately $500K (not
including application changes).
Summary
I have presented several methods of mirroring data between sites
focusing on the real-time mirroring techniques. This is by no way
a detailed or all-inclusive summary but describes the most common
techniques for achieving this function.
If the database vendors could make the data replication capabilities
of their products easier to implement, there would certainly be
a lot more usage of their products. Hardware storage vendor solutions
currently provide very good performance at short distances and probably
have the best potential of all of the solutions in terms of functionality.
They also are the most expensive.
I also discussed IBM's HAGEO software solution as implemented
in a real-life client situation. This provided the functionality
the customer was looking for along with the performance that was
acceptable to his clients.
Primitivo Cervantes is an IT Specialist who has worked as a
consultant for the last nine years. He has been in the computer/systems
industry for fifteen years and has specialized in high-availability
and disaster-recovery systems for the last seven years.
|