Tapes:
A Modern History, Trends
Henry Newman
During the past 30 years or so, tape technology has not changed
nearly as much as rotating storage (disks). But in most environments,
especially enterprise environments, tape is still a requirement
for continuity of operation. In most cases, even if you have an
off-site remote mirror, tapes are still required as a third copy.
Given that, most sites are still going to need tape in the future.
So this month I will talk about many of the issues surrounding tape
hardware.
As shown in Table 1, changes in this area have been incremental
at best. If you compare tape density and speed increases to CPU
system performance increases from the same period, you would have
a tape that would write at 629 GB/sec and contain uncompressed over
73 TB. Of course, that's impossible, just as impossible as
having a disk drive spin at over 600 million times per second, which
is the same ratio.
A more linear improvement in enterprise tape performance has only
been available with the release of the StorageTek T9940B and most
recently the Ultrium-II drive. Given that most of the world is running
RAID for critical environments, it might be fairer to compare RAID
capabilities than those of a single disk. So, over the same period,
we see:
Tape Density Increase 1333 Times
Tape Transfer Rate increase 24 Times
Disk Density Increase 2250 Times
Disk RAID-5 8+1 Density LUN Increase 18000 Times
Disk Transfer Rate Single Average 21 Times
Disk RAID Transfer Rate LUN Average 133 Times]
Whatever angle you look at, the performance and density of tapes
compared with that of rotating storage is out of whack. If you add
to that a comparison with CPU performance, you can see a huge imbalance.
Tapes Types
Most enterprise environments use tapes that write in a linear
format, but another tape type exists that generally has a higher
density -- known as helical tape. Helical tape has more contact
between the tape and the head on the tape drive. With linear tape,
the data is written lengthwise down the drive. With helical tape,
the data is written horizontally across the tape, hence the reason
for more contact with the tape drive heads.
Here are some general comparisons between these two tape types:
- A very small defect on a helical tape can corrupt the data
if the error correction buffer is full. Error correction space
is often left on the tape and if that space fills up, the tape
becomes unreadable.
- Helical tape heads wear out long before linear tape heads because
the tape heads make more intimate contact with the tape.
- Reliability is generally higher for linear tapes over helical
for both the media and head life of the drive because more contact
means more wear.
- Because of media wear, high-end linear tapes generally have
a longer storage life than high-end helical tapes.
Linear tape vendors include: IBM 3590B/E, STK 9840/9940, Quantum
SuperDLT, older DLT 7000/8000, and LTO, which many vendors sell.
Helical tape vendors include Sony, which makes AIT-1, AIT-2, and
AIT-3, as well as the DTF line of tapes. Other helical types include
8mm Mammoth and Mammoth-2 4mm(DAT).
Even for a single tape type, different tape providers often claim
better technology built into the construction.You will have to make
the determination of which claims are valid and make sense for the
different tape types.
Compression
Almost all tapes (unlike disks and RAID) automatically compress
the data input stream. This is an important consideration when determining
drive types because different drives have different compression
algorithms. Not surprisingly, enterprise tape drives from IBM and
StorageTek have higher compression rates than lower-end drives such
as DLT and Mammoth. Drive vendors often provide estimated compression
rates, but these are averages and your mileage may vary. Compression
is important given the cost of the media as a function of the drive
cost. Consider the following example:
Drive 1
Drive Cost: $35 000
Media Cost: $75
Compression 5 to 1
Drive Size 250 GB
Drive 2
Drive Cost: $5 000
Media Cost: $75
Compression 2 to 1
Drive Size 250 GB
Let's say you have 400 TB of raw data over a time period
that will need to be backed up. Drive One will require 327 pieces
of media at a cost of $24,525 and, including the drive itself, will
cost a total of $59,525. Drive Two will require 820 pieces of media
at a cost of $61,500 and, again including the drive itself, will
cost a total of $66,500. The cost of a larger tape library and the
cost of software licensing must also be considered, as some vendors
license by the number of tapes.
Clearly, compression must be a consideration in the total cost
of ownership of for tape systems but, as I said, your mileage for
compression on each drive type with your data will vary. One quick
way to see whether your data is compressible is to use the gzip
program with the -9 option. In my experience, by using gzip -9 "file
name", you will get the maximum compression achievable for
the data. The tape hardware usually has two parts for hardware compression:
data dictionary to compression against, and the compression buffer.
You might want to ask the vendor the size of each and the hardware
implementation (LZRW1, LZO, etc.)
You will need to test each of the tape drives that are under consideration
with a statistically significant sample of your data to determine
how your data behaves with the drive and its compression algorithm.
How It Will Be Used
Tape drives and the associated libraries have different characteristics
for tape load, tape ready position, and rewind time. In some cases,
this is not important, such as in applications like backup where
generally all you are doing is loading the tape and writing large
amounts of data sequentially, then rewinding the tape and moving
it back into a position in the library. On the other hand, with
hierarchical storage management (HSM) applications, tape load, position,
and rewind time become critical issues especially for reading data
back, but this will also depend on the requirements for the retrieving
the data. For a good definition of HSM, see:
http://www.snia.org/education/dictionary/h/
HSM applications are becoming more popular given the length of time
required for backup with increasing storage densities. In fact, StorageTek
developed the T9840A and B drives specifically for HSM applications
with small files. It has a 4-second load time and averages 8 seconds
to first data byte. Typical other products require 6 to 15 times more
than the T9840A and B drive times to first byte. Note, however, that
if the files are large, load and position time become insignificant
compared with transfer time. If you have a 20-GB file and, with compression,
the transfer rate is 30 MB/sec, the transfer time equals 682 seconds.
With a 50-second load and position time, that's only about 7.5%
of the total time. My rule of thumb when architecting a system is
to keep load and position time to less than 10% of the time to write
the data.
For HSM applications, reading is a different matter. Most applications
can consolidate the files to ensure large amounts of data are written;
reading, however, requires an understanding of the recall rate of
the files, the size of the files recalled, and, most importantly,
the speed requirement for recall. A credit card company that stores
information to provide approval codes is far different from a research
site doing genetic research recalling a gene for comparison between
two people. Understanding your application(s) environment is critical
to developing a good architecture.
Trends
Given all of these issues, one might ask, "Is tape dead?"
A number of the large storage vendors pronounced tape dead four
years, three years, two years ago, then again last year, and will
likely do so this year and next year. Tape has some significant
advantages over disk (rotating storage), however, that indicates
to me it's not dead yet. For example:
1. Tape does not require power -- Most modern disk drives
require power to be powered on for reliability. The Seagate 120
GB ATA drive, for example, uses 13 watts. That can get really expensive
if you have 400 TB of secondary storage.
2. Error Rates -- Bit error rates for ATA drives (FC and SCSI
drives are an order of magnitude better) are 10 to the 14th, while
bit error rates for enterprise tape are 10 to the 18th and other
tapes (AIT and DLT) are around 10 to 17th. That means tapes are
between two and four orders of magnitude more reliable than both
ATA and SCSI disk drives.
3. Tapes can handle higher shock than disks and still survive.
Note that all of the above information is available from the Web
pages of the companies mentioned.
What to Do
I believe that for at least the next few years, tapes and tape
drives will continue to be a critical part of the storage infrastructure.
This will continue because tape is far cheaper than rotating storage
in total cost of ownership given the issues with power requirements
for rotating storage and compression support with tape drives. Almost
all of the tapes in the market claim 30 years of shelf life --
even the lower end tapes. Keeping a tape for 30 years might be possible
but even if you manage to do that, how are you going to be able
to read it? Tapes, like any storage medium, are dependent on outside
influences like:
1. What is the interface and driver? (Try finding a SCSI-1 interface
from the early 1990s, much less an IPI-3 interface 20 years from
now.)
2. Will the tape drive be available to read the tape? (A little
over 30 years ago, 7-track tapes were state of the art, but finding
one to read a tape today will be next to impossible.)
3. What is the data format of the tape? (Some vendors write in
tar format, for example, and will tar or an application like Veritas
Netbackup be available in 2033?)
4. What is the data and will any program be able to read it? (PDF
is a popular format, and applications can read it today, but what
about reading a MS Word 2.0 document from 10 years ago with MS Word
2002?)
All in all I advocate a migration strategy. Whatever you do and
whatever tape type you decide upon, it is critical to plan a migration
strategy as part of the initial decision process. Nothing lasts
forever -- especially the way you read your data.
Henry Newman has worked in the IT industry for more than 20
years. Originally at Cray Research and now with a consulting organization,
he has provided expertise in systems architecture and performance
analysis to customers in government, scientific research, and industry
around the world. His focus is high-performance computing, storage
and networking for UNIX systems, and he previously authored a monthly
column about storage for Server/Workstation Expert magazine.
He may be reached at: hsn@hsnewman.com.
|