More
Truth about Tapes, Backups, and Restores
Peter Baer Galvin
In the July issue (http://www.samag.com/documents/s=8284/sam0307h/0307h.htm),
I discussed backup solutions and the reasons that sites stay with
antiquated tape and software technologies. This month, I'll
cover a variety of standard and unorthodox backup and restore solutions,
and reveal more truths.
The truth about tape: SCSI bus is still a better choice
than Fibre channel bus, in some situations.
In spite of the vendor hype, new technology is not always the
best solution. A case in point is Fibre vs. SCSI tape and library
attachment. While Fibre has some benefits, especially the increased
distance allowed between the connection points, SCSI is still the
best solution in some scenarios. SCSI solutions tend to be more
complex, but sometimes increased complexity is paid for by increased
function or reliability. Consider a Fibre HBA in a host, a SCSI-based
tape library, and a bridge between the host and the library. Experienced
systems administrators might frown on this solution due to the need
for multiple cables, connections, and devices. And indeed it could
be more prone to failure, more overhead to manage, and more complex
to debug, but consider that the bridge includes buffering that can
help stream data to the tape drives (making them more efficient
and decreasing backup time). The Fibre to the host allows the library
to be distant, and the SCSI drives and robot can be more reliable
than their Fibre equivalents.
I've seen instances when a Fibre-based robot control had
intermittent errors, while SCSI-based robot controls were rock-solid.
Given that Fibre is relatively new on the backup scene, and that
SCSI-based drives and robots have been in use for years, this should
not be surprising. Also consider that some drives are not yet available
with Fibre-attach, so this solution could allow for the best combination
of drive and robot technology while maximizing throughput, reliability,
and distances between the devices.
The truth about tape: SANs are not just for disks.
If your site has several data-centric hosts, a backup SAN could
be a good next-generation backup solution. Many sites have moved
to this architecture simply because their backup networks could
not move data fast enough to meet backup and restore window requirements.
A sample backup SAN is shown in Figure 1. Of course there are many
options and variations on this architecture. In this example, database
servers are directly attached to a storage array, and other hosts
with storage are connected on a network. The backup server is connected
to the robot and to the network. Backup data can stream from the
database servers, through Fibre, through the Fibre-SCSI bridges,
to the SCSI tape drives in the tape library. Likewise, data can
flow across the network, through the backup server, to the library
via Fibre to Fibre-attached tape drives. This type of facility has
several advantages (and disadvantages) compared to direct-attach
solutions:
- It allows for a central library to be used by many hosts.
- Throughput between hosts and tape can be scaled as required,
simply by increasing connections between the host, the SAN switches,
and the tape library.
- Backup management can be simplified by backup software than
can coordinate all hosts, storage, and library use.
- Adding hosts (and accompanying storage) is less likely to require
new infrastructure. In the usual case, the existing library can
accommodate the new data, or might require additional tape drives
within the library. With network-based backups, adding hosts might
cause a backup network to be inadequate. There the solution might
be to change the entire network infrastructure between hosts and
the backup server.
- Depending on the storage technology in use, backups can bypass
the application server and require only the backup server to drive
backup throughput. In Figure 1, the backup server also is connected
to the main storage. If the storage array supports snapshots or
other splitting/reconnecting technology, a copy of the data to
be backed up can be created and exported to the backup server.
(See also near-line storage below).
- Restores can be much faster than in direct-attach tape scenarios,
as all tape drives in the central library could be used for a
restore, moving more megabytes per second than if a smaller library
or single tape drive was attached to the target host.
A downside to a centralized backup solution is that backup software
needs to be more advanced to manage the backup and recovery (B&R).
My favorite solution is Veritas NetBackup (NBU). From one central
management console, NBU can orchestrate the allocation of tape drives,
use of those drives for backup or restore, and release of the drives
for use in some other backup or restore. Another downside is the
cost of implementing a backup SAN compared with a standard network-based
solution.
The truth about tape: unlike almost all other technology,
the larger the tape robot the lower the price per capacity.
Several small libraries are usually more costly than one large
library with the same capacity. Add to the equation the maintenance
cost and management overhead involved when more devices are in the
mix, and a central tape library makes sense, in most circumstances.
Sun has a nice library comparison available (although pricing information
will have to be obtained elsewhere): http://www.sun.com/storage/tape/tape_lib_comparison.html.
The truth about tape: like operating systems are best at
backing up and restoring their own kind.
A few years ago, I was consulting at a large site containing a
mix of Sun and Windows machines. The backup software they were using
was running on Windows. The backups appeared to work fine. Unfortunately,
the restores of Solaris, especially the root disks, were not so
fortunate. Generally, the "media server" -- the machine
driving the data to the tape drives -- should be of the same
operating system type as the machines the data are coming from.
Sometimes mixing does work, but as it's more prone to failure,
why take unnecessary risk?
The truth about tape: changing backup software is difficult.
Although I've talked with many sys admins who were unhappy
with their current backup software, only rarely have they felt enough
pain to switch to another solution. Some folks who even lost data
when they had a disk problem and a restore failure continued with
that infrastructure rather than replacing it. The reasons vary according
to the site, but mostly they involved the cost of buying the new
software. Also consider that the old software usually must be kept
to enable restores from the old backup tapes. This difficulty of
changing means that the backup software choice is likely a long-term
one, so choose wisely.
The truth about tape: the B&R technologies are rapidly
evolving.
There are several new technologies affecting the B&R problem.
Backup SANs have already been described. Another is new tape drive
technologies that increase speed and density. Some tape drives provide
write-once-read-many (WORM) functionality. The most interesting
action is in the appliance space. For example, Decru has a box that
encrypts data as it flows from the backup server to the tape drives
(and decrypts it during restore). This solution can assure that
all the hard work of securing your environment is not undone by
clear text tapes being taken off site.
Another area of appliance solutions is instant restore. These
new devices, from companies such as Revivio, record all writes performed
to the main storage. They can then provide a view of the storage
as it was at any point in time, such as just before a database corruption
or a data loss. This type of solution augments standard backups
rather than replacing them. Their key is that they make it unlikely
that tape will ever be needed for restores -- the restores can
be done directly from the appliance.
The truth about tape: applications add another layer of
complexity.
If it weren't for those dang applications, B&R would
be easy! Consider a database server, and the complexities of assuring
that the data to be backed up is consistent. Usually this involves
the backup software communicating with the database software to
coordinate access to the data. A simpler solution that is appropriate
with smaller databases is for the database administrator to automate
extraction of data (while it is quiesced) to a locally accessible
disk, and for the backup to copy that extract to tape.
The truth about tape: near-line is a new name for an old
idea (which doesn't make it bad).
One big B&R industry buzz is about "near-line storage".
This is not a new idea. Amanda, the free backup software package,
uses disk as the first stop for data being backed up. It then moves
the data from that buffer to tape. Old ideas can be good ones, and
near-line storage does have a place in many environments. Typically
it should be used when a tape solution cannot meet the required
backup or restore window. The data are copied to lower cost disk
(near-line storage) during the backup window, and then to tape during
a larger window. Restores can be done from the near-line storage
as well, with the cost of near-line storage being the only limiting
factor in determining the number of backups kept before deletion.
Note that no one is making the argument that near-line storage
is as cost effective as tape. Rather, it is lower cost than the
main storage, and faster than tape, and therefore can be of use
in some environments. Finally, note that the data could be moved
to near-line via backup software, in which case it is in backup-tape
format and would need to be restored to be used. Alternately, the
data could be copied to near-line, and would then be in its native
file system format and be available for testing or disaster recovery
use.
The truth about tape: network backups work.
With all of the above said about more complex backup environments,
consider that network-based backup infrastructure has been around
for a long time, and works for a lot of sites. The more complex
architectures certainly have their place, and are required in some
circumstances, but they should not be used when a simpler solution
would do. If the performance (or lack thereof) of an existing network
facility is the only impetus to move forward, consider using parallel
paths to stretch throughput. For example, IP Multipathing could
allow easy use of multiple backup networks. A 1-Gb network, when
driven by a Sun server, can move approximately 450 Mb per second.
Putting in a second or third gigabit interface and using IP Multipathing
can double and triple that performance. Do not forget that sufficient
CPU would be needed to drive that throughput. A good rule of thumb
is that one fast CPU can drive one high-throughput interface (be
it a Fibre channel or a gigabit network HBA).
The Truth
It is important that any B&R architecture focus on the primary
problem. That problem is not backups and their performance, rather,
it is restores and their performance and reliability.
Conclusions
There are quite a few obvious and subtle complexities involved
in designing or redesigning a backup and restore solution. Mistakes
can come from overlooking or underestimating these complexities.
Specific areas to consider are server to tape attachment, throughput
of backup and restore, robot control, and application interactions.
Hopefully the truths revealed here can help assure that your B&R
solution meets your requirements immediately, and for the distant
future.
Peter Baer Galvin (http://www.petergalvin.info) is the
Chief Technologist for Corporate Technologies (www.cptech.com),
a premier systems integrator and VAR. Before that, Peter was the
systems manager for Brown University's Computer Science Department.
He has written articles for Byte and other magazines, and
previously wrote Pete's Wicked World, the security column,
and Pete's Super Systems, the systems management column for
Unix Insider (http://www.unixinsider.com).
Peter is coauthor of the Operating Systems Concepts and Applied
Operating Systems Concepts textbooks. As a consultant and trainer,
Peter has taught tutorials and given talks on security and systems
administration worldwide.
|