IFOST Blog: March 2015

Monday 9 March 2015

Data Protector Q&A: SmartCache and non-staged restores

SmartCache is used for non-staged restore of files from vmdk files. As such, vms should be first copied to SmartCache and then to StoreOnce. What about if I want to restore files from VMs that are stored only on StoreOnce. Can I copy them to the SmartCache and then use non-staged restore?

- Andrej K

SmartCache devices are the fastest way of performing a Granular Recovery restore.

With a SmartCache device the mount operation completes immediately.
With non-SmartCache devices, the virtual machine disk is restored into a staging directory first.

Since it can take some time to restore a few hundred gigabytes of virtual disk (compared to the length of time it takes to restore one file out of that virtual disk) they can be significant time savings by using a SmartCache device.

You can copy from a non-SmartCache device into a SmartCache device, but you haven't really saved any time -- one way or another, you have just copied the data into an uncompressed folder.

Here's a relevant quote from Planning, Deploying and Installing Data Protector 9:

Backing up to a SmartCache device is very fast, as you would expect for a disk-based backup device. Unfortunately it does no compression or de-duplication so if you are backing up a 40 GB virtual machine, you will need 40 GB for the backup. So it is not really practical storage format for long-term archiving!

A typical VMware backup for an important virtual machine will use a SmartCache device as its destination. 1 to 2 days later there will be a copy job to put it into StoreOnce storage (while recycling the protection on the SmartCache storage). Perhaps at the weekend, or perhaps after a week or two there will be copy job to take it from StoreOnce and put it on tape.

Less important virtual machines - where it is okay to recover files in a few hours instead of instantly - would probably be backed up to StoreOnce initially, and then copied off to tape without ever using a SmartCache device.

Greg Baker is one of the world's leading experts on HP Data Protector. His consulting services are athttp://www.ifost.org.au/dataprotector. He has written numerous books (see http://www.ifost.org.au/books) on it, and on other topics. His other interests are startup management, applications of automated image and text analysis and niche software development.

Saturday 7 March 2015

#AlchemyAPI bought by IBM for #Watson -- it's obvious whom IBM will buy next

Every time I start getting enthusiastic about an artificial intelligence company's products, IBM buys them for Watson.

Last year I blogged about the code I had written to make a Google App Engine mail handler which displayed summaries of emails by keywords and sentiment. Today I've discovered that AlchemyAPI (who provided a lot of the smarts) have just been bought by IBM.

Earlier last year I was working with Cognea (and a few other chatbot companies) trying to get a service desk agent that could outperform human beings at a mega-sized organisation -- where even handling helpdesk text chats and email was a job handled by a large team. The part I was looking after (routing of incident tickets and predicting failed changes) turned into Queckt (AEED).

Yes, well, anyway, Cognea got bought last year too.

So here's my next prediction: IBM buys x.ai.

Two reasons:

I'm a big fan of x.ai's work, so based on past experience they will be bought out by someone sooner or later. If IBM doesn't, someone else will.
Lotus has an integrated calendar and mail system. It's in large organisations with busy executives who could be served very well by an (AI) agent who could organise meetings on their behalf. x.ai only need to plug-in a new back-end calendar service and a new fetch-from-a-Domino-server mail client. Then Notes actually has a useful feature that might stop customers slipping away to Google Apps and Office365.

On the other hand, not every company with artificial intelligence products gets bought by IBM. For example, IFOST hasn't. ;-)

Here are some of the things we've done:

Smart systems monitoring: automatic anomaly detection for events on CPU, memory and I/O; detection of log files without human intervention and then determining which messages are important or trivial; prediction of future capacity exhaustion.
Intra-oral dental image analysis
Improving ITIL ticket handling by 30% using vocabulary detection
Identifying student learning disabilities by timing analysis

Let me know if you are interested in any of these. I'll make sure we're not bought out by IBM!

Friday 6 March 2015

A thank you to all my readers - Amazon's 4th best-selling book on disaster recovery

I checked on Amazon's best sellers list for Disaster Recovery books tonight and discovered mostly that there are quite a few mis-filed books there (e.g. chef is not about disaster recovery!)

But the top four that actually are about disaster recovery are:

Michael Lucas' book Tarsnap - an encrypted storage system for Unix users who want extreme levels of paranoia about the security of their backups. Michael is a very famous sysadmin-writer who deserves first place for anything he writes.
Steven Nelson's book from 2011 which covers enterprise backup in general, glaringly omitting HP Data Protector.
Joe Kissell's book on Crashplan
My Planning, Deploying and Installing Data Protector 9.

Thank you to everyone who has bought it so far; I hope you have found it useful.

Tuesday 3 March 2015

Suggestions and updates for Data Protector books - a big thank-you to Chris Bhatt

Chris contacted me today having found numerous typos and corrections. They will propagate through the system over the next few days: Kindle users should see these update silently, I can send an updated PDF if you ping me and future print copy orders through CreateSpace will be the corrected version. Contact me if you want an errata list.

Chris also pointed out a number of things that I should have put in, but didn't. So here goes:

When you are writing to a virtual tape library (VTL) that does any kind of de-duplication, set the concurrency to 1. The reason you want to do this is so that you get very close to the same stream of data from one backup to the next: this will maximise your de-duplication. But there are some exceptions:

If you are using mhvtl on a Linux machine to emulate a tape library, then it doesn't make any difference what you do with concurrency. It does compression but that's only for very short streams of symbols that have already been seen before in that session.
If you are using a StoreOnce box, then don't use the virtual tape library option. Use Catalyst stores (StoreOnce devices) instead, as they use space more efficiently, and keep track of what has expired (and free it up). If you are worried about performance and want to do this over fibrechannel, this is possible if you are on Data Protector 9.02 and a suitable firmware version (3.12, for example).

I should have written more on whether servers should each have their own backup specification or one backup specification containing multiple servers. I think I'll do a blog post on this. When I've written it, it will be http://blog.ifost.org.au/2015/03/when-should-i-put-all-my-servers-into.html
It's worth reminding everyone of Stewart McLeods StoreOnce best practices (http://h30507.www3.hp.com/t5/Technical-Support-Services-Blog/DPTIPS-Data-Protector-StoreOnce-Software-Information-and-Best/ba-p/180146#.VOX1SXlyYdV). I had a customer just last week who lost their system.db file -- but they could easily have lost a store.db as well -- because of some disk corruption. I will ask try to expand out these and a few other suggestions in the next edition.
Another question that deserves an answer is "what's the best way to backup and restore files on a volume that has many thousands / millions of small files?". In version 7.x and before, this was a major bug bear because the database was so very slow that it could become the bottleneck. Often the only option was to turn off logging altogether on that backup object. Even today it's worth splitting it up into smaller objects (which I mention in chapter 6 -- look for Performance - Multiple readers in the index). I've also since realised that I never quite described exactly how the incremental algorithm works with multiple readers either.

What else should I add? What have I missed? What would have helped you when you started out?

Put any comments below (as blog comments, or on Google+) or email me (gregb@ifost.org.au) with your suggestions.

Greg Baker is one of the world's leading experts on HP Data Protector. His consulting services are at http://www.ifost.org.au/dataprotector. He has written numerous books (see http://www.ifost.org.au/books) on it, and on other topics. His other interests are startup management, applications of automated image and text analysis and niche software development.

Other related sites

I lecture in natural language processing, and research non-manifold machine learning. I consult to companies needing help with technology management, AI strategy.
I have built a prediction market platform for enterprises to help with employee engagement and strategic decision making.
I wrote this NPS survey analyzer
As a build-a-business-in-one-day exercise, I have a GPT-via-email bot.
Conference call system to help when you need simultaneous translation done, but the only equipment you have on hand are people's mobile phones... which aren't all smartphones: church-translation.com
My Amazon author page
In the past I used to answer a lot of questions on Quora
Early 21st century pre-singularity geek-nerd poetry also available in book form

Search This Blog