IFOST Blog: July 2014

Wednesday 30 July 2014

First look at Smart Cache

Data Protector 9.0 introduces a new kind of backup to disk device, imaginatively called "SmartCache". It is designed solely for VMware backups.

If you have backed up a virtual machine into a smart cache, and you happen to have the granular recovery extension installed, you can restore individual files from the VMDK very quickly. The end-to-end time to restore a file from a VMDK backup is not that different to restoring a file from a filesystem backup, actually. That is because instead of having to restore the whole VMDK file to a spooling location, the smart cache is essentially an already restored version, and it's just a matter of mounting it and browsing through it.

Starting in version 9, the granular recovery extension only works with VMware's web interface (the flash version). I spent a painful morning chasing up why the VMware Granular Recovery Extension comes up only in the "Available Plug-ins" section of the Vsphere Plug-in Manager with a status of "No client side download is needed for this plug-in". The answer is, it's like the VR Management plug-in: it only supports the web interface, not the vSphere client.

The smart cache device does no deduplication. VEAgent backups can't do in-line compression -- nor would it make sense here anyway -- so the amount of disk space required is exactly the same as the space required for the original VMDKs that you are backing up. Actually, it you want to keep a few days' worth, then you could well be looking at several times as much disk space required for your backups as for the originals. The client I'm doing this for has no Linux servers, otherwise I would have suggested putting the smart cache onto a BTRFS filesystem!

What this means is that you will probably pair the smart cache backup with an object copy job a few days later. Backup to the smart cache now, keep 2-3 copies in smart cache and then copy the oldest backup to a deduplication store where you can store it for a month or longer on disk, or possibly longer on tape.

Note that the default block size for the smart cache device is 1024Kb, which is larger than the defaults for (say) StoreOnce or most tape drives. This means that unless you change this (either by making the smart cache block size smaller, or the other device larger) that you won't be able to do any kind of object copy. Instead you will get errors like this:

[Major] No write device with suitable block size found.

If you don't have licenses for the VMware granular recovery extensions, then don't bother: just use a StoreOnce software device and save yourself a few TB of disk space.

Also, I would recommend bench-marking your VEAgent backups against a filesystem backup. You can create bootable disaster recovery images from filesystem backups (providing most of the advantages of a VMDK-level backup) and quite often the performance is comparable to (or sometimes better than) a VMware snapshot backup. Since filesystem backups can go to a StoreOnce store, it's much more space efficient than the combination of VEAgent + smart cache, and you can (obviously) restore individual files.

Greg Baker is an independent consultant who happens to do a lot of work on HP DataProtector. He is the author of the only published book on HP Data Protector (http://x.ifost.org.au/dp-book). He works with HP and HP partner companies to solve the hardest big-data problems (especially around backup). See more at IFOST's DataProtector pages at http://www.ifost.org.au/dataprotector

Monday 28 July 2014

Moment-in-time snapshot backups of Linux systems with Data Protector

You can get moment-in-time snapshot backups of Linux systems with Data Protector.

Everybody knows the "Use Shadow Copy" flag for Windows backups gets something very close to a moment-in-time snapshot, but there's a persistent rumour that you can only get a good moment-in-time snapshot of Linux systems if you take a VMware snapshot backup.

If you are backing up a btrfs filesystem (which is supported on the Linux agent: you can see it listed in /opt/omni/.util), then you can use its built-in snapshot capabilities.

But even ext3 and xfs filesystems can do snapshots with the help of the LVM layer. This technique dates back to the early days of HP-UX, but it works on modern Linux boxes (and probably on modern HP-UX boxes as well). You will need to do four steps:

Make sure that you have some spare space in the volume group which contains the volumes you are wanting to back up. Use vgdisplay and look for the lines about Free PE (free physical extents).
Make a backup specification (through the Data Protector GUI if you want to) including all the filesystems you want. Don't tick the host, tick each of the filesystems, even if you want all of them. In the pre-exec and post-exec fields of the filesystem defaults (not the pre-exec and post-exec for the whole backup job) put snapshot-preexec.sh and snapshot-postexec.sh.
Put snapshot-preexec.sh and snapshot-postexec.sh into /opt/omni/lbin.
Edit the backup specification. Look for the lines that say FILESYSTEM "/xxx" ... and replace them with FILESYSTEM "/mnt/backup/xxx" ... The end result should look like Linux-Snapshot-Example

Then you can run the backup as per normal.

The snapshot-preexec.sh script looks at the parent process which spawned it (which will be the vbda process) and finds the -volume parameter. Then it strips off the /mnt/backup part of it, and figures out the logical volume that the original filesystem is mounted on. It calls lvcreate --snapshot, runs fsck and then mounts that snapshot volume. So by the time the vbda process starts trying to read the /mnt/backup/xxx filesystem, the filesystem is mounted.

The snapshot-postexec.sh script cleans up these volumes.

There are two environment variables you can set in the backup specification which snapshot-preexec.sh will make use of.

SNAPSHOT_PREFIX (which defaults to /mnt/backup)
SNAPSHOT_SIZE (which defaults to 10m, which means that 10MB worth of block writes can be happen while the backup runs. For a busy filesystem this might be too little.)

Of course, the backups will be recorded as being of /mnt/backup/... which means that disaster recovery won't work properly (because it will never see your root filesystem as being backed up). It's a pity that there isn't an easy way of updating the internal database to make it think that it was a different filesystem backed up.

All in all, it's not that difficult to do. It only took me about an hour to set up. The mysterious thing is that 17 years after I first implemented this on an HP-UX system -- and I wasn't the first to do this -- it's still not out-of-the-box functionality.

Greg Baker is an independent consultant who happens to do a lot of work on HP DataProtector. He is the author of the only published book on HP Data Protector (http://x.ifost.org.au/dp-book). He works with HP and HP partner companies to solve the hardest big-data problems (especially around backup). See more at IFOST's DataProtector pages at http://www.ifost.org.au/dataprotector

What breaks when you haven't finished upgrading

The Data Protector cell manager -> client protocols are generally backwardly compatible. So if you have upgraded your cell manager to (say) version 9 but you still have clients running 6.2 most backups will continue to work.

Here's my list of what seems to break, or gives hard-to-trace-down errors:

There's a new -econfig option sent by version 9 cell managers which means that an earlier version of the Exchange 2010 agent quits badly (which the session manager reports as insufficient priviledges on CreateProcess). Note that you'll probably hit the Client named "MS Exchange 2010 Server" error first.
Some SQL backups work, some don't. I can't see what it is that's different between them, though.
Filesystem backups won't break, but I have noticed that very old agents (e.g. version 5) will give a warning about an unknown option if you have a cell manager that understands enhanced incremental backups.

But other than that, I've been reasonably successful upgrading the cell manager first, letting a few backups run to make sure that everything is working, and then slowly upgrading a few clients at a time. There doesn't seem to be a need for flag days where every single client gets upgraded at the same time.

When Exchange 2010 won't backup after a Data Protector upgrade

I have had two customers upgrade Data Protector, and suddenly have their Exchange backups fail.

Here's the characteristic error message:

[Major] From: BSM@exchange.ifost.org.au "Exchange 2010 Databases Backup" Time: 26/07/2014 9:17:02 PM
[61:8000] Client named "MS Exchange 2010 Server" not configured in the backup specification.

[Major] From: BSM@exchange.ifost.org.au "Exchange 2010 Databases Backup" Time: 26/07/2014 9:17:02 PM
Unknown internal error.

What's going on is that in DP 8.x and onwards, HP has added support for Exchange 2013, and so the "Exchange 2010" backups have been renamed to "Exchange 2010+". But the upgrade script doesn't reliably (ever?) update the barlist.

Simply open up the barlist file in a text editor. Look for where it says "2010" and replace it by "2010+".

Confirmed to affect DP 8.1 and 9.0, for upgrades from 6.2 and 7.x.

Greg Baker is an independent consultant who happens to do a lot of work on HP DataProtector. He is the author of the only published book on HP Data Protector (http://x.ifost.org.au/dp-book). He works with HP and HP partner companies to solve the hardest big-data problems (especially around backup). See more at IFOST's DataProtector pages at http://www.ifost.org.au/dataprotector

Tuesday 15 July 2014

Why upgrading from version 6 and 7 can be a little slow

I'm writing this while waiting for an in-place upgrade from Data Protector 7.0 to 8.1 to complete. Several hours later, I started wondering what it was doing. I could have just connected to the database using the run_script option to omnidbutil, but I thought I'd explore a bit further.

The connection details are in C:\ProgramData\Omniback\Config\Server\idb\idb.config (or whereever you have put the database; it's /etc/opt/omni/server/idb/idb.config on Linux). You will see a line like this:

PGPASSWORD='ZmxxeW5paGn0ZmovYQ==';

Copy and paste the part between the quotes into your favourite Base64 decoder to get the plain text password returned. You can then run:

psql --port=7112 -U hpdpidb_app -d hpdpidb

(psql is in "C:\Program Files\OmniBack\idb\bin" on Windows, /opt/omni/idb/bin on Linux)

PostgreSQL lets you see what queries are running -- unlike Raima Velocis, where this would have been very hard to figure out!

From the psql prompt:

hpdpidb=> select * from pg_stat_activity;
datid | datname | procpid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | xact_start | query_start | waiting |
current_query
-------+---------+---------+----------+-------------+------------------+-------------+-----------------+-------------+--
--------------------------+----------------------------+----------------------------+---------+-------------------------
------------------------------------------------
16388 | hpdpidb | 3608 | 16384 | hpdpidb_app | | 127.0.0.1 | | 49842 | 2
014-07-15 11:01:13.695+10 | 2014-07-15 11:23:58.31+10 | 2014-07-15 11:23:58.31+10 | f | DELETE FROM dp_catalog_position_seqacc_med \r +
| | | | | | | | |
| | | | WHERE medium_name NOT IN (SELECT medium_id FROM dp_medium_to_objects);
16388 | hpdpidb | 3780 | 16384 | hpdpidb_app | psql | ::1 | | 60664 | 2
014-07-15 15:10:49.911+10 | 2014-07-15 15:11:14.763+10 | 2014-07-15 15:11:14.763+10 | f | select * from pg_stat_activity;
(2 rows)

That's rather interesting: I'm waiting for it to run
DELETE FROM dp_catalog_position_seqacc_med
WHERE medium_name not in (select medium_id FROM dp_medium_to_objects);

So it's some kind of sanity clean-up to maintain referential integrity, perhaps?

Why is that taking so long, though? Time to explain the query:

hpdpidb=> explain delete from dp_catalog_position_seqacc_med where medium_name not in (select medium_id from dp_medium_to_objects);
QUERY PLAN

------------------------------------------------------------------------------------------------------------------------
----------------------------------
Delete on dp_catalog_position_seqacc_med (cost=1343236.10..514911431610.60 rows=2409753 width=6)
-> Seq Scan on dp_catalog_position_seqacc_med (cost=1343236.10..514911431610.60 rows=2409753 width=6)
Filter: (NOT (SubPlan 1))
SubPlan 1
-> Materialize (cost=1343236.10..1544864.81 rows=4819506 width=28)
-> Subquery Scan on dp_medium_to_objects (cost=1343236.10..1487821.28 rows=4819506 width=28)
-> Unique (cost=1343236.10..1439626.22 rows=4819506 width=131)
-> Sort (cost=1343236.10..1355284.87 rows=4819506 width=131)
Sort Key: p.medium_name, obj.uuid, obj.seq_id, obj.legacy_type_id, app.name, obj.name, obj.description
-> Hash Join (cost=107153.79..478801.50 rows=4819506 width=131)
Hash Cond: ((p.application_uuid = ovr.application_uuid) AND (p.objver_seq_id = ovr.seq_id))
-> Seq Scan on dp_catalog_position_seqacc_med p (cost=0.00..139912.06 rows=48 19506 width=50)
-> Hash (cost=76201.39..76201.39 rows=922427 width=125)
-> Hash Join (cost=325.37..76201.39 rows=922427 width=125)
Hash Cond: ((ovr.object_uuid = obj.uuid) AND (ovr.object_seq_id = obj.seq_id))
-> Seq Scan on dp_catalog_object_version ovr (cost=0.00..50509.27 rows=922427 width=43)
-> Hash (cost=243.97..243.97 rows=5427 width=103)
-> Hash Join (cost=2.72..243.97 rows=5427 width=103)
Hash Cond: ((obj.application_uuid = app.uuid) AND (obj.application_seq_id = app.seq_id))
-> Seq Scan on dp_catalog_object obj (cost=0.00..146.27 rows=5427 width=93)
-> Hash (cost=1.69..1.69 rows=69 width=52)
-> Seq Scan on dp_frontend_application app (cost=0.00..1.69 rows=69 width=52)
(22 rows)

So that's 922427 * 5427 * 69 = 345,414,781,701 row operations at an absolute bare minimum.

I think I might call it a day and come back in a few days' time. Or maybe a few weeks' time.

Greg Baker is an independent consultant who happens to do a lot of work on HP DataProtector. He is the author of the only published book on HP Data Protector (http://x.ifost.org.au/dp-book). He works with HP and HP partner companies to solve the hardest big-data problems (especially around backup). See more at IFOST's DataProtector pages at http://www.ifost.org.au/dataprotector

Data Protector 9.0 released

Normally the first I see of a new release is when it appears for evaluation on
http://www.hp.com/go/dataprotector but version 9 seems to have been released to customers on support contracts first.

The release notes are quite brief: nearly everything that has been announced as "new in version 9" was already available in the patches that took version 8.12 (Windows / Linux ). Integration with DataDomain, for instance.

The only obviously new option for most customers is that there is a new kind of backup-to-disk device ("Smart Cache") which you can use for VMware VEAgent backups. Then you can use the VMware GRE (Granular Recovery Extension) to extract out individual files from the Smart Cache device without having to restore the whole virtual disk first. This is obviously a big win for backing up virtual machines with large disks: it won't be necessary to do a file-level backup and a VMware backup.

Customers with large B6200 / B6500 arrays might find the federated de-duplication option useful because it means you don't have to assign engines to particular devices.

Greg Baker is an independent consultant who happens to do a lot of work on HP DataProtector. He is the author of the only published book on HP Data Protector (http://x.ifost.org.au/dp-book). He works with HP and HP partner companies to solve the hardest big-data problems (especially around backup). See more at IFOST's DataProtector pages at http://www.ifost.org.au/dataprotector

Friday 11 July 2014

Installing Data Protector cell manager on a minimal-install Redhat 7 / Centos 7

This is more of a note to myself, but if you do a "minimal" install of RedHat or Centos, you will be missing a number of important packages.

Here's what I do to fix this, before running omnisetup.sh

echo 'PATH=$PATH:/opt/omni/bin:/opt/omni/sbin:/opt/omni/lbin' \
> /etc/profile.d/omni.sh
chmod +x /etc/profile.d/omni.sh
. /etc/profile.d/omni.sh

useradd -m hpdp
yum install net-tools bc xinetd glibc.i686
yum install bind-utils psmisc mlocate telnet
;# not really necessary, but so useful...

mkdir -p /etc/opt/omni/server
chmod a+rx /etc/opt/omni/server

Then edit /etc/man_db.conf and add the following two lines in the appropriate stanzas.
MANPATH_MAP /opt/omni/bin /opt/omni/lib/man
MANDB_MAP /opt/omni/lib/man /var/cache/man/omni

You probably won't need a firewall on your Data Protector cell manager. In any case, the installer doesn't add exceptions to the firewalling rules like it does on Windows, so the cell manager can't import itself or start properly.

systemctl stop firewalld
systemctl disable firewalld

Now you can run
omnisetup.sh -CM -IS -install da,ma,cc,StoreOnceSoftware,autodr

Greg Baker is an independent consultant who happens to do a lot of work on HP DataProtector. He is the author of the only published book on HP Data Protector (http://x.ifost.org.au/dp-book). He works with HP and HP partner companies to solve the hardest big-data problems (especially around backup). See more at IFOST's DataProtector pages at http://www.ifost.org.au/dataprotector

Other related sites

I lecture in natural language processing, and research non-manifold machine learning. I consult to companies needing help with technology management, AI strategy.
I have built a prediction market platform for enterprises to help with employee engagement and strategic decision making.
I wrote this NPS survey analyzer
As a build-a-business-in-one-day exercise, I have a GPT-via-email bot.
Conference call system to help when you need simultaneous translation done, but the only equipment you have on hand are people's mobile phones... which aren't all smartphones: church-translation.com
My Amazon author page
In the past I used to answer a lot of questions on Quora
Early 21st century pre-singularity geek-nerd poetry also available in book form

Search This Blog

Wednesday 30 July 2014

Monday 28 July 2014

Tuesday 15 July 2014

Friday 11 July 2014