Introduction
to RAID
Henry Newman
I am sure that over the years, many of you have seen a great deal
written about RAID hardware, software, and a myriad of related topics.
In this column, I will look at the whole topic of RAID from a slightly
different perspective.
I divide RAID devices into two categories: cache-centric and storage-centric.
You may see different terminology used to describe the same thing.
Some people call these RAID types "enterprise" and "mid-range",
for example. Whatever you call them, there are major architectural
differences between these two device types.
Cache-Centric RAID
I use the term cache-centric because RAID devices in this category
depend significantly on data residing in cache to ensure good performance.
Cache-centric devices generally have feature sets such as:
- Very high reliability (dual everything, virtually no downtime)
- Large caches (e.g., 64 GB or greater)
- Designed emphasis on using RAID-1
- Software that allows snapshots, hot upgrades, and many other
features
- If RAID-5 is supported, generally a smaller number of devices
supported stripe sizes (e.g., 4+1 4 data disks plus 1 parity drive
as compared to 8+1 configurations available on mid-range products)
- Cache is always mirrored
- Large number of front-end connections
- Support for many types of remote mirroring (e.g., dark fibre,
IP)
- Smaller block sizes
- Huge amounts of storage managed in a single box (e.g., 100
TB)
- Per-component reliability testing
- Error monitoring for all hardware, including disk monitoring
- Designed for I/O processors (IOPs) not streaming I/O
- Far more bandwidth from cache to servers than from cache to
disk (I call this front-end bandwidth and back-end bandwidth)
- Very high cost per MB of storage compared with storage-centric
RAID
Cache-centric RAID vendors include:
- EMC Symmetrix
- Hitachi Data Systems 99xx series
- IBM Shark
Most of these products can run both on UNIX servers and on IBM
mainframes. They are designed for when reliability is the most critical
issue -- customers may trade reliability for performance because
reliability is more important. For these boxes to have good performance,
they need a high number of cache hits.
Storage-centric RAID
I use the term storage-centric because the dependency for these
devices is on the underlying storage, not on the cache. These devices
have the following features:
- Good reliability but not nearly as high as cache-centric devices
- Smaller caches (usually 2 GB to 8 GB)
- RAID-5 support with up to 20 disks in a RAID-5 LUN
- Cache mirroring can often be turned off to improve performance
- Far less software for management and maintenance
- Far less storage in a single box compared with cache-centric
RAID
- Excellent support for streaming I/O with large blocks
- Support for IOPs, but with small cache they have to be to disk
- More back-end bandwidth than front-end bandwidth (more channel
bandwidth from cache to disk than from cache to servers)
- Much lower cost per MB for storage than cache-centric RAID
Some examples of storage-centric RAID vendors include:
- Ciprico
- DataDirect Networks
- DotHill
- EMC CLARiiON CX line (Dell resells EMC)
- Hitachi Data Systems 9500
- LSI 5600 (OEM'ed by many other vendors)
- Sun T3 and S1
There are many vendors in this market area because it is much
easier to develop these products than cache-centric devices for
several reasons. For example, the reliability does not need to be
as great, and they don't need to create mainframe interfaces.
Many vendors in this market space are now looking optionally to
use IDE drives instead of SCSI drives to provide lower cost and
greater data density than is available from SCSI drives. Additionally,
a few vendors are developing IDE-only solutions for the low end
of this market. RAID devices of this type are far more prevalent
and usually have two or more times more bandwidth to disk than to
the front-end servers.
So Which Is Better?
As with many things in computing, "it depends on many factors".
Not long ago, I was asked to work on a project where the customer
wanted to use cache-centric or "enterprise" storage for
capture of high-speed data streams. I knew (as did the other technical
people involved) that the customer could not use a cache-centric
storage box for high-speed full duplex (writing one file and reading
other files) at the same time. The enterprise storage vendors benchmark
team said they could, but we thought this was absurd because we
knew this was a cache-centric device and we were correct. The customer
is using a storage-centric device. On the other hand, the same cache-centric
storage box will far exceed the performance of the storage-centric
device for a database where the index files are used often and will
fit in cache. A few new vendors are entering the market trying to
combine the best of both.
Before you can make a recommendation of what to buy, you need
to review your requirements and your operational usage. Here are
just a few issues to consider:
1. What are the uptime and reliability requirements?
2. What are your backup requirements?
3. What RAID level will you be using?
4. How big are your files and how are they accessed?
Reliability
It comes down to how much downtime you can afford for the box.
If you have many hosts connected, you cannot afford any downtime.
This is often called the 9 count, referring to the number of 9s
associated with product reliability. Table 1 shows the 9s and how
they equate to downtime. Knowing the reliability requirements and
the number of hosts attached will help determine the type of equipment
needed.
Backup
Cache-centric or enterprise boxes have the ability to create a
mirror, break the mirror for a backup, and then re-attach the mirror
and update the box. Most storage-centric or mid-range boxes do not
have this feature, although it is starting to become more common.
RAID Level
There are many different RAID levels, but I see RAID-1 and RAID-5
used most often. You will have to determine which RAID level you
need to use. The RAID level depends on:
1. Cost -- RAID-1 requires far more disks than RAID-5 for
the same amount of data storage.
2. How the data is used -- If you are making small random
requests, RAID-1 is faster than RAID-5. If you are making large
sequential requests and, especially, doing a great deal of writes,
RAID-5 will be much faster.
So, if you are making small requests (especially if they are random),
RAID-1 will be much faster than RAID-5 if both are tuned. With RAID-1,
each device is mirrored, but with RAID-5 you create a LUN with a
parity drive so that the LUN can be rebuilt if a device fails. With
RAID-1, you write far more data as each disk is mirrored than you
write with RAID-5.
File Sizes and Accesses
In today's world, the likelihood of everything fitting in
the RAID cache is very low. Heck, why would you even buy a RAID
if that were the case -- you could just buy an SSD (solid state
storage). The real questions to answer are: How are these files
accessed? Can the data be reused? Would a large cache help or will
it not make any difference?
Conclusions
The choice between cache-centric RAID devices and storage-centric
RAID devices likely will be made based on budgetary constraints
and not the performance of the devices. The operational environments
of these two types of devices are often vastly different. Other
issues to consider are:
- How many LUNs and how much storage do you want under the control
of one device?
- What RAID levels you want to run based on the cost per MB?
- What RAID levels are going to provide the best performance
given the applications?
- What are the application types and request sizes?
It really comes down to "the requirements". Purchases
should be made in terms of the requirements and budget so you get
the best value for the available monies.
Next time, we will dig deeper into RAID and discuss how RAID and
file systems layout should be architected.
Henry Newman has worked in the IT industry for more than 20
years. Originally at Cray Research and now with a consulting organization,
he has provided expertise in systems architecture and performance
analysis to customers in government, scientific research, and industry
around the world. His focus is high-performance computing, storage
and networking for UNIX systems, and he previously authored a monthly
column about storage for Server/Workstation Expert magazine.
He may be reached at: hsn@hsnewman.com.
|