IFOST Blog: January 2015

Thursday 29 January 2015

Using Data Protector to back up your AWS (Amazon EC2) instances

Your cloud-hosted infrastructure (whether it is on Azure, AWS, Google App Engine, IBM Rackspace or HP Cloud) is going to consist of:

Cattle, which are machines that you have automatically created and which contain no state that can be lost. They might have a replica of some data, but there will be other copies. If these machines fail, you just restart them or create a new instance. Hopefully you have this process automated.
Pets, which are machines that you administer and are installed manually. When these fail, you want to restore from a backup.

If you are a completely green-field site, then you won't have any backup infrastructure. But if you already have some in-house servers, then you will probably have existing backup infrastructure that you would want to make use of.

For example, the cheapest storage in the cloud at the moment appears to be Amazon Glacier, which costs USD10 per terabyte. But if you already have a tape library (or even a single standalone modern tape drive), you can easily have long-term cold storage at $0.50 per TB or less, and you probably already have some tapes.

Likewise, if you already have a Data Protector cell manager license, you might as well keep using it because it will work out cheaper than any dedicated cloud-hosting provider.

Broadly speaking, there are four options: a virtual tape library, external control EBS, StoreOnce in the cloud or the HP cloud device.

Virtual tape library

This option is appropriate if you are very, very constrained by your budget and need to be very conservative in how you do backup changes. If you are currently backing up to a tape library, then this lets you keep the illusion of the same thing but put it into the cloud.

Create a Linux instance in an availability zone that you are not otherwise using.
Install mhvtl on to it, and configure a virtual tape library with it.
Mount a very large block image (persistent storage device) on /opt/mhvtl.
You can now use this tape library just as if it were a real tape library.

If the Linux instance fails, then start a new instance, install mhvtl again, and change the host controlling the library. Note that media agent licenses are concurrent so if you make sure that you use this tape library only when you aren't using your in-house library, there is no additional licensing cost associated with this.

AWS (Elastic Block Store) volumes

The problem with the virtual tape library solution is that you are somewhat constrained by the size of the block storage that you are using. But with an external control device, you can attach and detach Elastic Block Store (EBS) volumes on demand as required. You can add slots to the external device by adding additional block stores.

Create a Linux instance in an availability zone that you are not otherwise using.
Write an external control script which takes the DP command arguments and attaches and detaches EBS volumes to the Linux box.
Create an External device, using that script.

StoreOnce low-bandwidth replication

The previous two options don't offer a way of using in-house tape drives.

If you have a way of breaking up your backups into chunks of less than 20TB, then you can use the software StoreOnce component on an EC2 instance. It works on Windows and Linux; just make sure that you have installed a 64-bit image. The only licensing you will need is some extra Advanced Backup to Disk capacity.

An alternative is to buy a virtual storage appliance (VSA) from HP, and then creating an Amazon Machine Image (AMI) out of it. This has the advantage that it can cope with larger volumes, and it also has better bandwidth management (e.g. shaping during the day, and full speed at night).

The steps here are:

Run up a machine in an availability zone which is different to whatever it is you are wanting to back it up. Use a Windows, Linux or VSA image as appropriate. Call it InCloudStorage-server
Create a StoreOnce device (e.g. "InCloudStorage").
Create backups writing to "InCloudStorage".
Create a StoreOnce device in-house. Call it "CloudStorageCopy".
If you are not using a VSA you will need to create a gateway for CloudStorageCopy on InCloudStorage-server. Remember to check the "server-side deduplication button".
Create a post-backup copy job which replicates the backups (which went to InCloudStorage) to CloudStorageCopy using that server-side deduplicated gateway.
Create a post-copy copy jobs to copy these out to tape.

The beauty of this scheme is that you can seed the CloudStorageCopy with any relevant data. As most of the virtual machines you are backing up will be very similar, you will achieve very good deduplication ratios. 20:1 is probably reasonable to expect, or possibly higher. So instead of having to transfer 100GB of backup images from the cloud to your office each day, you might only be transfering 5GB, which is quite practical.

HP cloud device

I discussed this in http://blog.ifost.org.au/2015/01/data-protector-cloud-backups-and-end-of.html . If you are using the HP cloud, then this is almost a no-brainer -- you don't even need to provision a server. For the other cloud providers, it depends on the bandwidth you get (and the cost of the bandwidth!) to the HP cloud whether this makes sense or not.

Greg Baker is an independent consultant who happens to do a lot of work on HP DataProtector. He is the author of the only published books on HP Data Protector (http://www.ifost.org.au/press/#dp). He works with HP and HP partner companies to solve the hardest big-data problems (especially around backup). See more at IFOST's DataProtector pages at http://www.ifost.org.au/dataprotector

Saturday 24 January 2015

@16z Outside of the USA, are we also seeing the same 16 themes in the way software eats the world? Here are my Oz impressions

I was perusing Marc Andresson's website. Marc is one of the better known venture capitalists and famously wrote that software is eating the world.

On his website there are 16 themes that he is seeing. Obviously, he invests big money in big startups, where I tend to be advising smaller and less splashy groups with little or no up-front cash. And north-western Sydney isn't Palo Alto. But there is quite a lot in common:

Sensorification of the enterprise. Yes, the insurer-broking-customer system I've been developing with the best and brightest in the industry is doing that, and that's one of the big takeaways. We haven't even begun to tap what can be done with mobile sensors. This one is universal.
Machine learning and big data. Obviously, I spend a lot of time helping customers with big data storage (as that's the other three-quarters of my work). Backup and recovery of big data is an issue that no-one knows how to do properly (myself included). But on the machine learning side, yes, I'm seeing this everywhere. There was the project where I was extracting keywords out from emails ( http://blog.ifost.org.au/2014/05/keywords-from-emails-on-google-app.html ) and displaying a colour summary of what the author of the email thinks about those topics. The data we got from the big data analysis I did of COI group's staff attitudinal surveys was extraordinary -- did you know that there are 5 distinct different ways to be a successful organisation, and another 14 to be unsuccessful? In another we found that a flower shop should stay open an hour later.
Containerisation is not a trend I'm seeing yet in customers with existing infrastructure. What I am seeing is customers simply not having their own data centre presence -- not even maintaining a server rack or a local server. So the containerisation I'm seeing is more around self-contained applications managed by different companies.
Digital health. From the dental camera system that I've been developing, to the start-up I was talking to last week, software is eating up the entire medical device industry. The medtech industry (which is big in Sydney -- bigger than most people realise) is turning into a developer of sophisticated software for small cheap sensors.
I'm not seeing the efficient online marketplaces trend as clearly. Perhaps because I'm based in Australia with a smaller less efficient market (in general) but specialty and uniqueness is still common, and I don't see any Australian companies racing to the bottom on price and staying in business.
Bitcoin and blockchain. The retailers I'm speaking to don't see bitcoin as a priority, but they don't really have a problem with it. So it will come, and the volatility will reduce. On the other hand, if I put insurance contract binding information into the blockchain, I'm pretty sure my insurance customers would freak out and run hiding. So this trend will come from the retail side.
Funnily enough, whether to do cloud-client computing is a pressing question that I'm researching at the moment. That is, for two projects I'm doing at the moment I know I have a lot of in-mobile image and sound analysis computation to do. Bandwidth from the mobile is a problem, but so is CPU time on the mobile. Where to do it? It's not obvious which is better.
I'm not doing any work for Controlability any more (that stopped when I went to google), but we were well ahead of the curve on that. What we were building in Darwin and Brisbane (hardly centres of technology) are what a whole bunch of "Internet of things" startups are trying to do now. It was secure, it was sensible, it was cost-effective and it made the residential development companies money. So I'm a bit biased on this: I think the rest of the world is playing catch-up.
I can't say much about online video, but it seems to me that the money previously being spent on face-to-face training and e-learning is there for the taking as these move to video. I still don't have any video content to sell for anything I'm doing (I'm still writing books) but I can see the market is ready for it. Australia was well ahead of this curve, as the business of doing face-to-face training went undead several years ago; from what I've seen India and China are well behind on this curve.
The changes sweeping the insurance industry: yes: real-time data extraction from company systems to give a day-by-day risk premium. We're working on it.
DevOps: the week before last I was talking to a very famous software company about their SaaS offering. They have a long, long way to go to get from "sysadmins of a box that runs an application" to "site reliability engineering". Maybe, just maybe, some of the other Australian SaaS players are doing it better, but I doubt it. So few companies here have enough scale that they can do any statistically meaningful analysis of their outages or incidents. So while lots of people will talk about it, it's going to be a long time before DevOps is going to make a measurable difference here, sadly.
Failure and the culture of fail fast. I see no evidence of this anywhere here. The vast majority of start-ups are self-funded, boot-strapping and grabbed an opportunity that arose. The QUT CAUSEE study showed no correlation between the success of a start-up and the number of failed startups the principals had been involved in. The closest QUT found was a kind of bonus for pivoting: if you changed direction as a result of customer feedback directing the company to a new product or solution in that industry then that was a very good predictor of success.
I don't have enough experience to comment on full-stack startup, virtual reality and crowdfunding
While I used to be able to say with confidence that I was a security guru, I don't think I can comment on Andresson's theme; I'm probably losing it.

What I found interesting is what didn't appear, but are really clear themes that I'm seeing:

Image analysis is everywhere. Every company has dozens of problems which can be solved by moderately simple image analysis techniques. There simply aren't enough knowledgeable gurus out there to do the work.
Interfacing with the low-tech. I think there's a Silicon Valley bubble which assumes that everyone wants to be on-line with a smart-phone all the time. But there are many workers and customers that simply can't or won't do this. In the last few weeks: a startup the severely disabled on-line with assistive technology; the retail conversations were around simplifying warehousing procedures so that the tech un-savvy can cope; the edutech project getting tech-averse sport coaches to be able to put wet weather into their school's twitter, facebook and skoolbag systems.

Friday 23 January 2015

What items the granular recovery extension for Sharepoint can recover

This is all documented in the GRE Sharepoint manual, but mostly for my benefit for later, here is what you can restore with the Data Protector Granular Recovery Extension after you have mounted a recovery database:

Libraries:

Document library
Form library
Wikipage library
Report library
Asset library
Picture library
Translation Management Library

Communication:

Announcements
Contacts
Discussion board
Links
Calendar
Tasks
Project tasks
Issue tracking
Survey

CustomList
User Information List
Pages and Sites:

Page
Site
Publishing pages
Sites with a blog template: Posts, Comments, Categories
Sites with a meeting template: Meetings, Agenda, Attendees, Decision, Meeting Objective, Text Box, Things To Bring, Home Page Library

Thursday 22 January 2015

Data Protector cloud backups and the end of tape

The end of tape storage is not quite here yet. There are times when you really, really want to keep data off-line so that it is definitely safe from sysadmin accidents and sophisticated hackers.

But there is a lot of data that doesn't quite meet those needs. I've used rsync.net, and LiveVault and while they are useful too, what the world really needs is one backup solution that can do local disk, local tape and also replicate to the cloud very, very fast and very, very cheaply.

HP appears to have done this with Data Protector 9.02. There's a nice new option under "backup to disk" which appears to be a StoreOnce device running in the cloud.

I was already using HP's public cloud anyway for when I run Data Protector training classes, so it was just a matter of filling in my access keys and so on in the following screens.

The result is a device that can be used for backup, copying, restore, etc. You can back up locally, and have an automatic copy job to replicate it to the cloud. Or you can backup your cloud-hosted servers direct to the cloud, and then drag it back down to copy off to your tape drives later.

At 9c/GB per month it's nowhere near the cheapest on the market (Google was 2c/GB per month last time I checked, and Amazon have their tape-like Glacier service at 1c/GB per month). But that's the cost of the space that you use: deduplication should take care of a lot of this.

What would be nice next: some way of replicating this to a tape library hosted by HP in their public cloud (similar to Amazon).

Greg Baker is an independent consultant who happens to do a lot of work on HP DataProtector. He is the author of the only published books on HP Data Protector (http://www.ifost.org.au/press/#dp). He works with HP and HP partner companies to solve the hardest big-data problems (especially around backup). See more at IFOST's DataProtector pages at http://www.ifost.org.au/dataprotector

Monday 19 January 2015

SQL Backup Error

Today's challenge was understanding why I was getting the following error message when trying to back up MS-SQL with Data Protector.

[Normal] From: OB2BAR_SQLBAR@mssql.ifost.org.au "INST01" Time: 19/01/2015 4:59:46 PM
SQL statement:
BACKUP DATABASE (Frobnob_Report) TO
VIRTUAL_DEVICE = 'Data Protector_INST01_Frobnob_Report_055943030_0'
WITH NAME = 'Data Protector: 2015/01/19 0010', COPY_ONLY, BLOCKSIZE = 4096, MAXTRANSFERSIZE = 65536;

[Critical] From: OB2BAR_Frobnob_Report_0@mssql.ifost.org.au "INST01" Time: 19/01/2015 4:59:46 PM
Virtual Device Interface reported error:
The object was not open.

See also Data Protector debug.log and SQL Server error log for details.

[Normal] From: OB2BAR_Frobnob_Report_0@mssql.ifost.org.au "INST01" Time: 19/01/2015 4:59:47 PM
Backup Profile:

Run Time ........... 0:00:01
Backup Speed ....... 0.00 MB/s

[Normal] From: OB2BAR_Frobnob_Report_0@mssql.ifost.org.au "INST01" Time: 19/01/2015 4:59:47 PM
Completed OB2BAR Backup: mssql.ifost.org.au:/INST01/Frobnob_Report/0 "MSSQL"

[Major] From: OB2BAR_Frobnob_Report_0@mssql.ifost.org.au "INST01" Time: 19/01/2015 4:59:47 PM
Aborting connection to BSM. Abort code -2.

[Major] From: OB2BAR_SQLBAR@mssql.ifost.org.au "INST01" Time: 19/01/2015 4:59:48 PM
Error has occurred while executing a SQL statement.
Error message: 'SQLSTATE:[42000] CODE:(3202) MESSAGE:[Microsoft][ODBC SQL Server Driver][SQL Server]Write on "Data Protector_INST01_Frobnob_Report_055943030_0" failed: 995(The I/O operation has been aborted because of either a thread exit or an application request.)
SQLSTATE:[42000] CODE:(3271) MESSAGE:[Microsoft][ODBC SQL Server Driver][SQL Server]A nonrecoverable I/O error occurred on file "Data Protector_INST01_Frobnob_Report_055943030_0:" 995(The I/O operation has been aborted because of either a thread exit or an application request.).
SQLSTATE:[42000] CODE:(3013) MESSAGE:[Microsoft][ODBC SQL Server Driver][SQL Server]BACKUP DATABASE is terminating abnormally.
'

The debug.log and SQL server error logs were uninformative as well, giving exactly the same information as the last message above.

The Windows event log had numerous copies of the same error message, but the very first one said this:

SQLVDI: Loc=ClientBufferAreaManager::SyncWithGlobalTable. Desc=Open(hBufferMemory). ErrorCode=(5)Access is denied.

. Process=4380. Thread=3820. Client. Instance=INST01. VD=Global\Data Protector_INST01_Frobnob_Report_060932541_0_SQLVDIMemoryName_0.

A lot of investigation later and I discovered that this only happens when using Windows Authentication. I can work around this by changing the DataProtector INET service to run as the user I was wanting to run the SQL backups with and then to use Integrated Authentication.

I still don't quite know what's causing this. I'm guessing that the VDI process is running as a different user or in a different context.

Greg Baker is an independent consultant who happens to do a lot of work on HP DataProtector. He is the author of the only published books on HP Data Protector (http://www.ifost.org.au/press/#dp). He works with HP and HP partner companies to solve the hardest big-data problems (especially around backup). See more at IFOST's DataProtector pages at http://www.ifost.org.au/dataprotector

DataProtector and TCP wrappers (libwrap) etc.

While it's rare to run into a system using TCP wrappers rather than a host-based firewall, I ran into one today in the form of the Vsphere vCenter Server Appliance.

To cut a long story short, you can install the Data Protector agent quite happily (I pushed it from my Linux-based installation server which has OB2_SSH_ENABLED=1 by adding my ssh key to the server appliance's .ssh/authorized_keys). But then it can't be imported.

You will see lines like this appear in the appliance's /var/log/messages :

2015-01-19T01:25:06+00:00 app01 xinetd[19865]: libwrap refused connection to omni (libwrap=inet) from ::ffff:192.168.1.14

2015-01-19T01:25:06+00:00 app01 xinetd[19865]: FAIL: omni libwrap from=::ffff:192.168.1.14

It's been so long since I dealt with TCP wrappers that I spent ages remembering what to do. In the end, it's just a matter of putting the following into /etc/hosts.allow

inet: 192.168.1.0/255.255.255.0 : ALLOW

Adjust based on whatever IP ranges and subnets you need to allow. Or use "ALL" instead of 192.168.1.0/255.255.255.0 if you don't have any security concerns.

Thursday 15 January 2015

@xdotai - Amy the virtual assistant

I suspect that X.AI have the shortest website domain name in the world: http://x.ai/

They make very clever software for organising meetings. Simply CC in their robot (Amy or Andrew, depending on your preference) and their robot will extract out vague time specifications from your emails and then start an email exchange with everyone else the email was sent to. Eventually you get a meeting entry created in your calendar.

Customers with Google Apps should make this a mandatory sign-up for all their senior staff as it saves hours per month on tedious to-and-fro work that can be automated.

Permissions you need for Data Protector to be able to back up VMware

It's on pages 43-44 of the IntegrationVirtualization manual, but it's over a page break and I can never find it when I need it. These permissions need to be at the top level. Doing it at a lower level (e.g. at the datacentre level) doesn't seem to work.

Datastore -> Allocate space
Datastore -> Browse datastore
Datastore -> Low level file operations
Datastore -> Remove file
Datastore -> Rename datastore
Folder -> Delete folder
Folder -> Rename folder
Global -> Disable methods
Global -> Enable methods
Global -> Licenses
Host -> Configuration -> Maintenance
Host -> Inventory -> Add standalone host
Network -> Assign network
Resource -> Assign virtual machine to resource pool
Resource -> Remove resource pool
Resource -> Rename resource pool
Sessions -> Validate session
vApp -> Delete
vApp -> Rename
vApp -> Add virtual machine
Virtual machine -> State -> Revert to snapshot
Everything under Virtual machine -> Configuration
Virtual machine -> Interaction -> Answer question
Virtual machine -> Interaction -> Power Off
Virtual machine -> Interaction -> Power On
Virtual machine -> Inventory -> Create new
Virtual machine -> Inventory -> Register
Virtual machine -> Inventory -> Remove
Virtual machine -> Inventory -> Unregister
Everything under Virtual machine -> Provisioning
Virtual machine -> State -> Create snapshot
Virtual machine -> State -> Remove snapshot

HP and Gartner doing a joint webinar on Data Protector

It's happening on Jan 30th, very early in the morning Sydney time, or Jan 29th for everyone else. Just for fun I ran the announcement through AlchemyAPI's language analysis tools to see what HP's key messages are...

Keyword and Relevance

data center environment

0.900901

positive

analyst firm Gartner

0.684689

positive

continuously evolving landscape

0.675654

positive

analyst Dave Russell

0.663644

positive

elastic core data

0.640681

positive

modern data center

0.59824

positive

data protection

0.596675

positive

Adaptive Backup

0.551905

neutral

traditional architectures

0.496183

positive

recovery challenges

0.490281

positive

Key trends

0.469901

positive

recovery requirements

0.462933

positive

0.386948

positive

webinar

0.359441

positive

datacenter

0.358502

positive

solutions

0.353497

positive

demands

0.3495

positive

functions

0.333073

neutral

legacy

0.332964

positive

foundation

0.332838

neutral

ability

0.331847

positive

components

0.330498

positive

Topics

0.329724

neutral

environments

0.329655

positive

organization

0.329415

positive

business

0.329328

positive

Constantly evolving data centre: yes, that sounds about right.

Greg Baker is an independent consultant who happens to do a lot of work on HP DataProtector. He is the author of the only published books on HP Data Protector (http://www.ifost.org.au/press/#dp). He works with HP and HP partner companies to solve the hardest big-data problems (especially around backup). See more at IFOST's DataProtector pages at http://www.ifost.org.au/dataprotector

Friday 2 January 2015

Celebrating Christmas alphabetically

My family has been celebrating "alphabetic" Christmas since 2002, working our way through the English alphabet. It's a pity that we didn't start in 2001, because it meant that we skipped the letter "A", which we will presumably do in 2027. Anyway, the 14th letter of the alphabet is "N", so for 2014 Christmas all the presents began with the letter "N".

The rules initially were "begins with the letter of the year, and under $10", but we found that very difficult and bumped it up to $20. This turns out to be the sweet spot where it is still possible, but only just.

Sometimes I cheat a little: if I'm buying from the USA I'll make it a limit of USD20 per present, and then I'll get it shipped to Stackry (because so many USA retailers will offer free shipping to a New Hampshire address). Then I get Stackry to consolidate it into one package and send it to me... which means I can't really calculate how much the shipping of each item actually was, and I can pretend to have stayed within budget.

Why do we do this?

Christmas gift giving is often extremely stressful and awkward. It's very hard to figure out what to buy for someone.

In the past, I don't think it would have been this difficult. Everyone was much poorer. Firstly, this meant that there was always something that you knew a recipient genuinely needed. Secondly, it meant that buying a present was a real sacrifice. It might have meant going without a meal, or going without something important in order to buy for someone else. But today, most of the time there is no sacrifice in buying a present. It doesn't cost anything other than money -- money that can be earned before or even afterwards.

What this leads to is a pseudo-sacrifice of time. The problem seems to be that it is too easy to buy presents, so no-one feels comfortable until they've spent a lot of time on it (spinning uselessly, second guessing, and so on).

So the solution to this is to agree to create arbitrary, hard constraints that make the problem more difficult. We agreed to alphabetic Christmas because it means that it is very, very hard to find anything suitable. Once you've found something, you don't ever wonder whether there is a better present for someone -- there almost certainly isn't. And if not everything is an absolute hit with the recipient, they understand that there really might not have been anything better.

It works quite well: success is when the recipient bursts out laughing from the absurdity of it, or discovers that actually, the present is something really useful or good.

In case anyone else is doing alphabetic Christmas, here's some suggestions of some sensible presents beginning with the letter 'N' (Christmas 2014 for us):

A Nibble pan (a cake pan with a built-in cupcake receptacle so that you can eat a sample to make sure the recipe is working).
Nasturtium and nigella seeds
No Thanks - a very clever card game
Nanobots Arena - a really engaging tile game where you are battling your nanobots out in a petri dish
A Larry Niven novel (or if you look you sci-fi more gritty, perhaps the Nightwatch series.
A bit of a cheat: one of the most successful and interesting non-fiction books of recent years.

We're already dreading 2024 (when everything begins with the letter 'X') but I think we all still plan to give it a try. We don't know what we'll do once we've cycled back to "A" again: perhaps we'll go through the alphabet again, or do it in another language, or something else.

Other related sites

I lecture in natural language processing, and research non-manifold machine learning. I consult to companies needing help with technology management, AI strategy.
I have built a prediction market platform for enterprises to help with employee engagement and strategic decision making.
I wrote this NPS survey analyzer
As a build-a-business-in-one-day exercise, I have a GPT-via-email bot.
Conference call system to help when you need simultaneous translation done, but the only equipment you have on hand are people's mobile phones... which aren't all smartphones: church-translation.com
My Amazon author page
In the past I used to answer a lot of questions on Quora
Early 21st century pre-singularity geek-nerd poetry also available in book form

Search This Blog

Thursday 29 January 2015

Virtual tape library

AWS (Elastic Block Store) volumes

StoreOnce low-bandwidth replication

HP cloud device

Saturday 24 January 2015

Friday 23 January 2015

Thursday 22 January 2015

Monday 19 January 2015

Thursday 15 January 2015

Friday 2 January 2015