Chris also pointed out a number of things that I should have put in, but didn't. So here goes:
- When you are writing to a virtual tape library (VTL) that does any kind of de-duplication, set the concurrency to 1. The reason you want to do this is so that you get very close to the same stream of data from one backup to the next: this will maximise your de-duplication. But there are some exceptions:
- If you are using mhvtl on a Linux machine to emulate a tape library, then it doesn't make any difference what you do with concurrency. It does compression but that's only for very short streams of symbols that have already been seen before in that session.
- If you are using a StoreOnce box, then don't use the virtual tape library option. Use Catalyst stores (StoreOnce devices) instead, as they use space more efficiently, and keep track of what has expired (and free it up). If you are worried about performance and want to do this over fibrechannel, this is possible if you are on Data Protector 9.02 and a suitable firmware version (3.12, for example).
- I should have written more on whether servers should each have their own backup specification or one backup specification containing multiple servers. I think I'll do a blog post on this. When I've written it, it will be http://blog.ifost.org.au/2015/03/when-should-i-put-all-my-servers-into.html
- It's worth reminding everyone of Stewart McLeods StoreOnce best practices (http://h30507.www3.hp.com/t5/Technical-Support-Services-Blog/DPTIPS-Data-Protector-StoreOnce-Software-Information-and-Best/ba-p/180146#.VOX1SXlyYdV). I had a customer just last week who lost their system.db file -- but they could easily have lost a store.db as well -- because of some disk corruption. I will ask try to expand out these and a few other suggestions in the next edition.
- Another question that deserves an answer is "what's the best way to backup and restore files on a volume that has many thousands / millions of small files?". In version 7.x and before, this was a major bug bear because the database was so very slow that it could become the bottleneck. Often the only option was to turn off logging altogether on that backup object. Even today it's worth splitting it up into smaller objects (which I mention in chapter 6 -- look for Performance - Multiple readers in the index). I've also since realised that I never quite described exactly how the incremental algorithm works with multiple readers either.
What else should I add? What have I missed? What would have helped you when you started out?
Put any comments below (as blog comments, or on Google+) or email me (firstname.lastname@example.org) with your suggestions.