Cover V12, I07

Article
Figure 1

jul2003.tar

The Truth about Tapes, Backups, and Restores

Peter Baer Galvin

Backups and restores are crucial functions to the integrity of data at a site. And yet once an implementation is in place, except for babysitting, these functions tend to be ignored. Why? Well, backups are certainly not glamorous, and no one really likes to muck with them. The risk/reward ratio on the backup and restore process is a deterrent as well. If, for example, someone tweaks the backups to make them run faster or smoother, or makes the restores easier or faster, then the rewards tend to be small. On the other hand, that same someone could be seeking alternate employment if that tweaking caused restores to fail.

Even with this understood, I've decided to take the risk and discuss the state of backups and restores (B&R), and some current technologies that may make the reward better than the risk for updating the B&R at your site.

Here are a few data points to have handy when making a determination at your site:

  • What tape technology do you use?
  • What is the capacity of the tape, both raw and compressed (if you use compression). What compression rate do you reach?
  • Do you use disk as a staging storage for tape (i.e., "nearline storage")?
  • How many tapes do you buy a year, and at what cost?
  • If you use a library, how many full and incremental tape sets do you keep loaded?
  • How many tape sets do you send off-site?

The next step to evaluating your B&R methodology is to use a backup calculator. Some vendors provide them if you are using or evaluating their technology. Sun has one freely available at their Web site:

http://www.sun.com/storage/tape/tape_lib_calculator.html
This calculator is very good about accepting quite a lot of input data and allowing "what-if" scenarios to be evaluated. Like most backup calculators though, it only considers solutions that the vendor sells. Regardless, it makes a good starting place.

An example of using the calculator is shown in Figure 1. It depicts a site with 2TB of usable disk, growing at 20% per year, with 70% utilized and 10% changing per day. The backup window is eight hours for full backups and six for incremental ones. There are one full backup and six incremental backups per data set, and the site wants to keep two full data sets in the robot (two weeks worth of backups). One limitation of this calculator is that it does not show how many tapes would be used in total (including tapes taken off-site), but that calculation can be done based on the numbers input and output. Also, some other calculators show how many Mb/sec of throughput are needed to accomplish the backups in the given windows, and that can be useful for determining if the networking between the backup server and the backup clients is sufficient.

Where the tool excels is in working through what-if scenarios. For example, you can select the tape drive technology and see the effect on the number of drives, tapes, and slots needed in the library. You can also change backup windows, data sets in rotation, and assumptions such as the compressibility of the data. By trying these scenarios, you can not only determine the best drives and library to use, but also avoid being boxed-in by the solution (should you chose to implement it).

In Figure 1, the recommended library holds 180 tapes. This capacity is fine for the first three years, but is exceeded in year four, requiring an additional library. (Of course, tape drive technology will likely have evolved by then, and if the library supports new drives the data capacity of the robot could be upgraded.) By selecting another tape robot, such as the L700, you can see how much longer a larger tape robot would last. The "Display Library Requirements Matrix" button saves time by showing all of the possibilities on one screen. Each variation of tape drive and library that is supported by Sun is shown, reflecting the choices available given the data you input to the calculator.

The truth about tape: upgrading your tape drives can save money.

For instance, if you have DLT8000 drives, enter the appropriate values in the calculator to see how many LTO tapes and drives you would need. For the most complete picture, determine how many tapes you are using in a time period (three years, for example), and how many you buy in that time period. Then determine how many tapes you would need using a newer tape technology. Multiply by the cost of the tapes, and you may find that over that time period, even including the cost of the new drives and possibly even a new robot, your costs would be lower than sticking with the older, lower density tapes.

Consider a scenario where your site has 100 DLT8000 tapes in a library and buys 200 per year. The ballpark price for those tapes, over three years (600 tapes), would be $42,000, storing a total of 42TB (at 1.5X compression, including the 100 in the library). To store the same amount of data with LTO tapes needed for three years (even adding the 100 stored in the library), it would take 280 tapes at a cost of $19,600. The savings could be used to pay for the LTO drive and a robot. Of course, the backups are also likely to complete faster due to increased drive throughput, and be more reliable due to new drives, and there would be fewer tapes to take and manage off-site, so there are non-financial benefits to consider as well.

The truth about tape: compression rates vary wildly depending on the data being moved to tape.

In the above example, we considered a 1.5X compression rate. Unfortunately, compression rates are hazardous to consider in capacity and performance calculations because they are so affected by the data being compressed. Some data is non-compressible (consider audio recordings that are already compressed), while some are highly compressible (such as clear text). The safest bet, if you have mixed data or do not know the data compressibility, is to assume zero compression. This is the worst case, so any compression of the data would be good news, and would add extra capacity and performance to your solution.

The truth about tape: robot and tape reliability also varies wildly.

This truth is difficult to provide statistically, because vendors (as far as I know) do not provide reliability information. Rather, the data tends to be anecdotal or site- and scenario-specific. For example, my experience with 8MM tape drives was that they failed after a year of heavy use, while LTO drives seem to be quite hardy. Given a lack of concrete data, how is this truth useful? At a minimum, you might include discussion of reliability in any discussion with solution vendors, and bear in mind that reliability is a consideration, along with other parameters, in any tape solution you may chose.

Conclusions

Backups and restores are complicated, fraught with risk, and difficult to get excited about. However, there are rewards for evaluating the current solution at your site and considering upgrades of various kinds. Backup calculators can help guide your selection, but there are truths about B&R technology that should also be considered. This month exposed some truths, and next month I will finish this topic by exploring SCSI vs. Fibre tapes and robots, near-line storage and its uses and abuses, backup SANS vs. backup networks, and backup software choices.

Good Book Alert

It is off topic, but I think it is worth mentioning that the third edition of Practical Unix & Internet Security by Garfinkel, Spafford, and Schwartz is now available from O'Reilly and Associates. At first blush, this is a solid update to an already very solid book. New topics include LDAP, Samba, wireless, intrusion detection, MacOS X, and much more. Previous editions have been well thumbed, and I expect this one will be, too.

Peter Baer Galvin (http://www.petergalvin.info) is the Chief Technologist for Corporate Technologies (www.cptech.com), a premier systems integrator and VAR. Before that, Peter was the systems manager for Brown University's Computer Science Department. He has written articles for Byte and other magazines, and previously wrote Pete's Wicked World, the security column, and Pete's Super Systems, the systems management column for Unix Insider (http://www.unixinsider.com). Peter is coauthor of the Operating Systems Concepts and Applied Operating Systems Concepts textbooks. As a consultant and trainer, Peter has taught tutorials and given talks on security and systems administration worldwide.