Search This Blog

Tuesday, 30 December 2014

Auditing VMware backups

A customer asked me to report on whether every virtual machine in their VMware environment was getting backed up.

HP Data Protector includes a report on the last successful backup for various objects, but it doesn’t provide a convenient way of tying that in with what is on a VMware cluster.

So I wrote a program, imaginatively called vm-backup-audit.pl. You can get a copy from http://www.ifost.org.au/dataprotector/software/vm-backup-audit.pl

This program queries the vcenter-server given as a command-line argument, and identifies all the virtual machines on that server. It uses the VMware Perl SDK to do this (there's a program called vidiscover.pl which is makes use of.)

It also queries the Data Protector internal database for the last 14 days to find out what objects have been backed up during VEAgent backups. It then prepares a list of virtual machine names and shows when they were last (successfully or unsuccessfully) backed up. 


If the virtual machine has never been backed up successfully in the time frame , the message no full backup completed cleanly will be shown. Otherwise, the relevant session IDs will be shown in reverse chronological order.
This is the kind of output it gives:


[LaptopDatacenter:linuxvm1] 2014/12/29-6 (Full) 2014/12/29-5 (Incremental)
[LaptopDatacenter:unbacked-up-vm] <no full backup completed cleanly>
[LaptopDatacenter:linuxvm3] 2014/12/29-6 (Failed Full) 2014/12/29-5 (Incremental) 2014/12/28-1 (Full)



In full honesty, there are some obvious short-comings:

  • It doesn’t correctly handle two virtual machines with the same name in the same data center. This is probably impossible anyway, so doesn’t matter. If they are in different data centers it is able to distinguish them.
  • It’s not smart enough to understand that a virtual machine might be getting cloned or replicated between data centres.
  • It might not cope very well with mixed Hyper-V and VMware environments. It might not cope very well with two instances running simultaneously.
  • It has only been tested on a version 9.02 Linux-based cell console, talking to a Windows cell manager. It won't be hard to get working on anything else, but I just haven't done it yet.

If these matter to you and have a budget to cover fixing any of these, please get in touch and I'll see what I can do.

Greg Baker is an independent consultant who happens to do a lot of work on HP DataProtector. He is the author of the only published books on HP Data Protector (http://www.ifost.org.au/press/#dp). He works with HP and HP partner companies to solve the hardest big-data problems (especially around backup). See more at IFOST's DataProtector pages at http://www.ifost.org.au/dataprotector

Tuesday, 23 December 2014

Lots of interesting new features and support in 9.02

  • Data Protector can now back up mainframes: IBM System z running z/OS version 2.1. If there was any doubt whether HP is targeting IBM customers during IBM's current implosion, I think this should put it to rest. HP is obviously trying to position DP against TSM.
  • There is now VMware 3PAR integration. I haven't researched this in any depth, but I presume what this means is that it is now possible for Data Protector to snapshot 3PAR arrays which are using LUNs to store VMFS volumes. If it works like Oracle, filesystems, MS-SQL, etc. then Data Protector should be able to mount this onto another system and back it up, and also restore snapshots instantaneously.
  • Data Protector introduces support for pausing the lower priority backup sessions when the scheduled higher priority jobs share the same device. The paused jobs are resumed once the device is available.
  • Viewing the names of VMs (rather than UUIDs) in VEagent backups is available again
  • Catalyst over fibre channel
  • I didn't see it documented, but it looks like this is the release where cloud backups [link to blog post] were introduced.
  • The VMware Granular Recovery Extension has a new user interface, and a new way of being installed.

As a reminder, I keep a list of all the latest patches (including the 9.02 patch bundle) at http://www.ifost.org.au/Documents/dp-patch-list.html.

And as another reminder, if you are going to upgrade to 9.02 by running a new cell manager alongside an old cell manager, then you will want to buy my book on migrating and cloning cell managers ( http://www.ifost.org.au/press/#dp ) which looks like it was the best selling book on business software by an Australian author on Amazon Australia last week.


Greg Baker is an independent consultant who happens to do a lot of work on HP DataProtector. He is the author of the only published books on HP Data Protector (http://www.ifost.org.au/press/#dp). He works with HP and HP partner companies to solve the hardest big-data problems (especially around backup). See more at IFOST's DataProtector pages at http://www.ifost.org.au/dataprotector

Wednesday, 17 December 2014

Migrating Linux or HP-UX cell manager configuration to and from a Windows cell manager

All the backup definitions (including integration backups, schedules and so on) in Data Protector are plain text files so they are easy to edit and easy to move around.

On Windows, these are UTF-16 encoded, and on HP-UX and Linux they are ASCII encoded. If you copy a backup specification out of C:\ProgramData\Omniback\config\server\datalists on a Windows system to /etc/opt/omni/server/datalists on a Linux machine, the resulting file won't work properly. It will appear to have every second character have a null in it.

Fortunately, Linux boxes come with a utility called iconv. Here is how to use iconv to convert a datalist which was on a Windows system after it has been transferred to a Linux system.

iconv -f UTF-16 -t ASCII datalists/main-backup > datalists/main-backup.tempmv datalists/main-backup.temp datalists/main-backup

You can't just redirect output back to datalist/main-backup because that will overwrite the file before it is read.

If you need to do a whole directory:


find . -type f -exec sh -c 'iconv -f UTF-16 -t ASCII "{}" > "{}.temp" && mv "{}.temp" "{}"'  ';'

Of course, if you are migrating between cell managers, then you should pick up a copy of my latest book on migrating and obsoleting Data Protector cell managers.

Greg Baker is an independent consultant who happens to do a lot of work on HP DataProtector. He is the author of the only published books on HP Data Protector (http://www.ifost.org.au/press/#dp). He works with HP and HP partner companies to solve the hardest big-data problems (especially around backup). See more at IFOST's DataProtector pages at http://www.ifost.org.au/dataprotector

Tuesday, 16 December 2014

A new Data Protector book

I've recently published another book based on my experiences with customers migrating off physical hardware and off older versions of Data Protector.

This is not a long book, which is why I've priced it at a fraction of anything I've written before, but I hope you find it valuable.

This book is for consultants and system administrators who need to migrate, clone or obsolete an instance of HP Data Protector.
• How to duplicate an existing cell manager on to a new system (one with new hardware, virtualised hardware, or a new operating system).
• How to merge two cells into one, leaving you with one less cell manager than before. In Data Protector version 7 and earlier there were a lot of limitations on the size of a Data Protector cell and how much a cell manager could do. This meant that many customers had several cells -- in version 8 and onwards this is not really necessary. This book walks step-by-step through how you would consolidate cell managers.

Read this book before you start planning your next Data Protector upgrade!

Links to buy from Amazon (in Kindle format) are here, but I've re-done the way I build these books and I can now produce them in other formats as well. Contact me if you need me to do this for you.


Monday, 15 December 2014

Odd behaviour on a large incremental backup job

I've been investigating why one of my customer's incremental backups is so large.

They are doing incrementals forever -- they run one full backup a long time ago, and have been synthesising full backups each weekend ever since out of their incremental backups.

One backup object has 500GB backed up every night, and as it's a small third-world country branch office, this didn't make much sense.

When I went into the session (e.g. by pretending to start a restore) and tried to find which files had changed, I came up blank. The modification time of most of the files was way in the past. There was no reason they should have been backed up.

Adding to the mystery, when I turned off "Do not use archive attribute" the incremental shrunk down to something sensible.

So either Data Protector isn't resetting the archive attribute flag in this particular situation, or something else is turning the archive flag on.

The file server in question is running Win2k8 R2 with a Data Protector 8.11 connecting to a 9.01 cell manager. I'll have to upgrade it confirm it's still a problem, but I suspect it will continue to be.


Greg Baker is an independent consultant who happens to do a lot of work on HP DataProtector. He is the author of the only published books on HP Data Protector (http://www.ifost.org.au/press/#dp). He works with HP and HP partner companies to solve the hardest big-data problems (especially around backup). See more at IFOST's DataProtector pages at http://www.ifost.org.au/dataprotector

Wednesday, 10 December 2014

Cannot write to device (JSONizer error: Invalid inputs)

I ran into the following problem at a customer site running 9.01 with all the latest patches. In context it was fairly obvious what was going on. But I pity anyone running into this cold because it would have been a very long and slow debugging process.


[Critical] From: [email protected]linux1.ifost.org.au "linux1.ifost.org.au [/home]" Time: 26/11/2014 4:03:41 PM
[80:1031] Received ABORT request from NET => aborting.

[Major] From: [email protected]backup.ifost.org.au "LinuxBackupStorage [GW 4464:0:11286004362757246666]" Time: 26/11/2014 4:03:41 PM
[90:51] \\backup.ifost.org.au\LINUX_OS\1f00010a_54755f1f_06cc_0099
Cannot write to device (JSONizer error: Invalid inputs)

[Critical] From: [email protected]linux1.ifost.org.au "linux1.ifost.org.au [/home]" Time: 26/11/2014 4:03:41 PM
Connection to Media Agent broken => aborting.



The relevant part of the datalist file (in /etc/opt/omni/server/datalists on a Linux-based cell manager, and C:\programdata\omniback\config\server\datalists on a Windows-based cell manager) looked like this:

HOST "linux1.ifost.org.au" linux1.ifost.org.au
{
   -trees
      "/"
   -exclude
      "/home/scratch"
      "/home/pgdumps"
}
There's nothing wrong with that: it will work correctly when writing to a tape device. The same structure (host with exclusions) would probably work on a Windows box. But a Linux client backing up to a StoreOnce store causes the BMA to crash.

This also occurs when you split it into filesystems.

FILESYSTEM "linux1:/" linux1.ifost.org.au:/ {
}
FILESYSTEM "linux1:/home" linux1.ifost.org.au:/home { 
   -exclude 
      "/home/scratch" 
      "/home/pgdumps" 
}

Both of these backup specifications will work correctly when written to tape. Presumably one day a patch will be released to fix this. Contact HP support if necessary, and I'll try to update this posting when it's fixed.



Greg Baker is an independent consultant who happens to do a lot of work on HP DataProtector. He is the author of the only published book on HP Data Protector (http://x.ifost.org.au/dp-book). He works with HP and HP partner companies to solve the hardest big-data problems (especially around backup). See more at IFOST's DataProtector pages at http://www.ifost.org.au/dataprotector

Tuesday, 28 October 2014

VMware, Changed Block Tracking, disk expansion and silent backup corruption

VMware have released this Knowledge Base article.

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2090639

If you have Change Block Tracking turned on (which is a sensible thing to do) and then you expand a virtual disk to be larger than 128GB, the data provided to the backup provider is wrong.

I haven't verified this with Data Protector, but I can't see any reason why DP wouldn't be affected in the same way as (say) Veeam, which uses the same mechanism.

Solution: turn off change block tracking, run a backup, and then turn change block tracking on again.

Greg Baker is an independent consultant who happens to do a lot of work on HP DataProtector. He is the author of the only published book on HP Data Protector (http://x.ifost.org.au/dp-book). He works with HP and HP partner companies to solve the hardest big-data problems (especially around backup). See more at IFOST's DataProtector pages at http://www.ifost.org.au/dataprotector

Monday, 27 October 2014

Unknown error 1053 starting hpdp-as

As I mentioned in my post about 1053 error for hpdp-idp-cp ( http://blog.ifost.org.au/2014/06/unknown-error-1053-starting-hpdp-idp-cp.html ), error code 1053 on Linux doesn't have any particular meaning. It's just a catch-all to say "something went wrong during service startup".

I've experienced this twice now, and both times it was for different reasons.

Here's the output from omnisv status...

    ProcName      Status  [PID]    
===============================
    crs         : Active  [26506]
    mmd         : Active  [26504]
    kms         : Active  [26505]
    hpdp-idb    : Active  [26466]
    hpdp-idb-cp : Active  [26499]
    hpdp-as     : Down
    omnitrig    : Active
    Sending of traps disabled.
===============================


With a bit of inspired guessing, hpdp-as is supposed to be started by /etc/rc.d/init.d/hpdp-as. Despite what it looks like, this isn't actually used as SYSV init runscript -- it is invoked by /opt/omni/sbin/omnisv start. This in turn is invoked by /etc/rc.d/init.d/omni, which actually is a SYSV init runscript.

/etc/rc.d/init.d/hpdp-as isn't part of any RPM file; neither are any of the other Data Protector start up scripts.


  • /etc/init.d/omni is installed by the OB2-CS post-install scriptlet.
  • /etc/init.d/hpdp-as is created by the IDBsetup.sh script when it calls an internally-defined updateServices function

The first time I encounted "Unknown error 1053", it was simply because something had gone wrong during installation, and /etc/init.d/hpdp-as wasn't created. I just took the code from IDBsetup.sh (search for hpdp-as and you'll find an init script inside a heredoc) and recreated it. Then I checked that the appropriate /etc/services entry had been created ( "hpdp-idb-as 7116/tcp"  )

If this happens to you, here's a handy /etc/init.d/hpdp-as for reference. Just change lnx.ifost.org.au to whatever your cell manager's hostname is:


#!/bin/sh
# chkconfig: 35 99 08
# description: HP Data Protector Application Server.
# processname: hpdp-as

### BEGIN INIT INFO
# Provides: hpdp-as
# Required-Start: $local_fs $remote_fs $network $syslog
# Required-Stop: $local_fs $remote_fs $network $syslog
# Default-Start: 3 5
# Default-Stop: 0 1 2 4 6
# Short-Description: HP Data Protector Application Server
### END INIT INFO

#Defining AS_HOME
AS_HOME=/opt/omni/AppServer

case "$1" in
start)
echo "Starting the HP Data Protector Application Server..."
nohup su - hpdp -c "${AS_HOME}/bin/standalone.sh -b lnx.ifost.org.au &"
;;
quick)
nohup su - hpdp -c "${AS_HOME}/bin/standalone.sh -b lnx.ifost.org.au > /dev/null &"
;;
stop)
echo "Stopping the HP Data Protector Application Server..."
su - hpdp -c "${AS_HOME}/bin/jboss-cli.sh --connect command=:shutdown"
;;
log)
echo "Showing server.log..."
tail -1000f /var/opt/omni/log/AppServer/server.log
;;
*)
echo "Usage: /etc/init.d/hpdp-as {start|stop|log}"
exit 1
;; esac
exit 0



The second time I encountered this (which was today), /etc/init.d/hpdp-as was present. A bit of digging into it revealed that it calls /opt/omni/AppServer/bin/standalone.sh -b cell-manager-hostname as the user hpdp (or whatever you are running Data Protector as). Helpfully, standard output is redirected to /dev/null, so whatever errors you might encounter are not reported anywhere.

When I ran that manually I saw (buried in the standard output which would otherwise have been discarded):

15:35:14,505 ERROR [org.jboss.msc.service.fail] MSC00001: Failed to start service jboss.logging.handler.FILE: org.jboss.msc.service.StartException in service jboss.logging.handler.FILE: java.io.FileNotFoundException: /var/opt/omni/log/AppServer/server.log (Permission denied)

And indeed, /var/opt/omni/log/AppServer is owned by root, and unwritable by hpdp-as.

[[email protected] AppServer]$ ls -ld /var/opt/omni/log/AppServer
drwxr-xr-x. 2 root root 21 Sep 16 22:40 /var/opt/omni/log/AppServer

Various other problems crop up, all to do with permissions. As far as I can tell, the following chown command will fix all of them. 
sudo chown hpdp \
  /var/opt/omni/log/AppServer \
  /var/opt/omni/server/AppServer  \
  /opt/omni/AppServer/standalone/deployments  \ 
  /etc/opt/omni/server/AppServer

sudo /opt/omni/sbin/omnisv start


Greg Baker is an independent consultant who happens to do a lot of work on HP DataProtector. He is the author of the only published book on HP Data Protector (http://x.ifost.org.au/dp-book). He works with HP and HP partner companies to solve the hardest big-data problems (especially around backup). See more at IFOST's DataProtector pages at http://www.ifost.org.au/dataprotector

Tuesday, 21 October 2014

Auskey on Yosemite (dealing with the Australian Taxation Office using a MacOS 10.10 system)

The Australian Taxation Office has a very fragile website called the Business Portal. When it works, it's fine, but the challenge is that if you modify anything on your computer at all, you are bound to cause BP to break.

So, if you have upgraded to Yosemite (MacOS 10.10), you can of course expect Business Portal to stop working, as it does after every upgrade. Usually there's a Whirlpool forum on how to get it working again, but I couldn't find one.

I took me a few hours tonight to figure it out. Fortunately my GST report isn't due until next week.

Step 1. Don't go to www.java.com, because Java 7 isn't supported on Yosemite.

Step 2. Don't install Java 8 from the Oracle download site either, because the business portal doesn't seem to work with Java 8.

The ECI client requires Java 6, for example.

Step 3. If you've done either of these, uninstall that version of Java, and remove the Java plug-in from your browser. Since Chrome doesn't do Java at all on MacOS this means Firefox and Safari.

Step 4. Download Java 6 from here: http://support.apple.com/kb/dl1572

Step 5. Follow the procedure here: http://support.apple.com/kb/ht5559 in order to activate the plug in.

Step 6. Go to http://bp.ato.gov.au/


Update: the business portal does work with Java 8, thanks to Paul Martin....
After tearing my hair out, installing, reinstalling, failing... a tech support fellow had the solution:(I'm running Firefox 33.0.1 with Java 8u25 on Yosemite 10.10)- Go to Java Control Panel- Select Advanced- Scroll down to 'Advanced Security Settings' and DESELECT 'Use TLS 1.2'- Restart browser and login to the Business Portal

If you are reading this at the end of a long hard day wrestling with software bugs stopping you from filing your tax (which you probably didn't want to do either), why not settle down with a really fun book of nerd-geek poems? Guaranteed to lighten your day, has poems about everything from nuclear physics to time travel and helps pay for the cost of this blog:  When Medusa Went on Chatroulette

Sunday, 19 October 2014

Resources for start-ups in Sydney

This was my answer to a Quora question about a USA-based entrepreneur setting up a local presence in Australia.

Sydney is definitely the right city if your company is a medtech or financial tech startup.  Melbourne does biotech better, but even then there are successful biotech companies in Sydney. Canberra would only be appropriate for a startup doing government or military technology. 

I'm not an immigration lawyer and it's not legal for anyone without approval from the department of immigration to give immigration advice. The right person to talk to is an immigration lawyer.


The equivalent of a USA LLC is a "Pty Ltd" company. These are registered with the Australian Securities and Investment Commission (ASIC) using form 201:http://www.asic.gov.au/asic/asic... or you get your accountant or lawyer to do it for you. There are also companies that process shelf companies and assign it to you; this is the quickest and least painful, but the most expensive (usually a couple of hundred dollars).

Your company needs to have at least one director appointed who is a resident. This can cause a chicken-and-egg problem for an entrepreneur coming to Australia as the first and only employee, because they won't be able to start the company to nominate them as an employee. The solution is to get someone already resident in Australia to be a director initially, and then once here and set-up to fill in the form with ASIC to replace your local contact director with yourself.

Australian businesses need to register for an Australian Business Number with the taxation office. https://abr.gov.au/for-business,... Most startups will be lodging a monthly report to the tax office to say how much was paid to employees that month, a quarterly report to the tax office on how much GST (goods and services tax) the company collected and paid, and an annual report to the tax office on the company profits. The tax office are not as onerous as the IRS and the people they employ generally nicer. They try to help startups to some extent. For example you get a window of time after the company starts where they won't penalise late tax paperwork. Also, if there is a some genuine reason for not being able to lodge your tax on time (e.g. a family emergency) you can request an extension. I haven't had to do it very often, but I've never had an extension request denied yet in 15 years.

I recommend using  http://www.saasu.com/ for managing the Australian entity's accounts as this does everything right for Australian business automatically. It has a report that tells you exactly what to put in what field for your monthly and quarterly reports. 

It is important to get proper tax advice on the company structure before doing anything in Australia that might give the company a value. For example, it's very tax-inefficient for the USA-based company to own the Australian entity. It might work out better if the USA-entity's owners can own the Australian entity directly.

There are government programs for helping start ups as well. The one that is really worthwhile is the R&D Tax Incentive which can give a better-than-100% tax deduction on research and development. http://www.business.gov.au/grant...

It is compulsory to have workers' compensation insurance if a company employs anyone (even the owner). It's not very expensive, and it's not hard to set up. I use an insurance company called QBE, but there are plenty of others: http://www.qbe.com.au/Workers-Co...

For other insurances I've generally directed budding entrepreneurs to a broker (for the last three generations my family have used these folks: http://www.jmdross.com.au/). 

All of the above is true for starting up a company of any kind in Australia. There are a couple of resources worth mentioning for start-ups:

  • http://sydstart.com is the main start-up event for startups in Sydney. Normally it happens twice a year, but in 2014 there was just one big event.
  • http://fishburners.org  is the largest tech co-working space in Australia. If you need a community to be among, this is probably a good place to start.
  • http://atp-innovations.com.au/ is one of Australia's (and one of the world's) leading incubators.

There are some down-sides to setting up a start-up in Australia:
  • There are problems with venture capital and angel investment in Australia. 99.8% of all start-ups never get any external investment. (That's even though 30+% of all start-ups describe themselves as high-tech startups).
  • It's very difficult to offer share schemes to employees as they get hit by the "geek tax" (people are trying to get it fixed).

That said, Sydney is one of the best places in the world to run a business. 


Here's a quote from the Atlantic:  http://www.theatlantic.com/business/archive/2011/05/the-worlds-26-best-cities-for-business-life-and-innovation/238436/#slide22

Sydney is the only city in the survey to finish in the top five in ease of doing business, livability, health and safety. More than sun and surf, it's consistently called the best place to start a business by international surveys. 

The only bad thing that they could come up with was the cost of public transport. (Which is true. You'll need to get an Opal cardhttp://www.opal.com.au/ and even then many people end up paying >$100 per month to get around.)

There are no problem getting hold of brilliant developers in Sydney. Ads are generally on Seek (http://www.seek.com.au/) . There is a reason that the world's major companies do large amounts of development here: Google Maps, Canon Research Labs, Cochlear, Atlassian, and many others. Start hiring: you'll quickly discover why.

Thursday, 2 October 2014

Data Protector 8.13 released

The list of fixes is very long, including a lot of VMware issues. If you have a support contract, you can download it from the following links:



Greg Baker is an independent consultant who happens to do a lot of work on HP DataProtector. He is the author of the only published books on HP Data Protector (http://x.ifost.org.au/dp-book). He works with HP and HP partner companies to solve the hardest big-data problems (especially around backup). See more at IFOST's DataProtector pages at http://www.ifost.org.au/dataprotector

Wednesday, 1 October 2014

Sydney Schools Wet Weather Line (now on iphone)

Just in time for the final term of the year, I've now finished writing the iphone version of the school sports wet weather app.

The fundamental challenge is that even the most technophobic of parents have better access to technology than the head of sports at most schools.

There are three reasons for this:

  • There's a technology curve: first the scientists dream it up; then the military want it; then businesses find commercial uses for it; then consumers ask for it; then it becomes a toy for children; finally, years later, it becomes available to schools.
  • The kind of fit-and-rugged outdoors kind of person that wants to spend their life helping kids play sport is not the sort of person fascinated by the latest gadgets. Even if 
  • A muddy field in the middle of nowhere at dawn is not a place where technology reigns. Voice is often the only good option, SMS is possible. Updating a web page is a challenge.


The parents want messages pushed to them. But the school often has a lot invested (in time, money and often face) in whatever solution is currently being used to distribute sport status, even if the school is aware that it doesn't scale well and has issues.

So I've implemented the only solution that really works: 
  • If the school doesn't mind changing their wet weather phone number, the head of sport can call a number, leave a message and have that instantly pushed out to all parents who use the app. Other parents can call in.
  • If the school can't change their wet weather phone number, then I've also written a centralised dialler that regularly calls the school's existing wet weather line, and pushes any new messages out to parents' phones as soon as it finds out.
I've also had a number of requests to get this push function include into the school's own custom app as well.

Here's the iphone version:

On average, I'm getting more than one new school per week asking to come on board. It's nice to have a start-up project with such quick traction!

Linux iptables firewalling rules for use with Data Protector

Every client (and the cell manager) needs port 5555 open, unless you've changed the default port for the omniinet service.

Do you have a special "backup" network? If it's accessible on (say) eth1, then
iptables -I INPUT -p tcp -i eth1 --dport 5555 -j ACCEPT
Or, if you want to restrict a client so that it only receives connections from the cell manager (if the cell manager has an IP address of 192.168.200.100:
iptables -I INPUT -p tcp -s 192.168.200.100 --dport 5555 -j ACCEPT
You could get the same effect by adding an only_from parameter in /etc/xinetd.d/omni or by turning on cell security.

If the client also has tape drives (or the robotic control for a tape library) attached then you will need to open up a range of port numbers. Here I've allowed 10 concurrent connections, which would be appropriate for a 9-drive tape library with a robotic controller:

test -e /opt/omni/.omnirc || cp /opt/omni/.omnirc.TMPL /opt/omni/.omnirc
echo OB2PORTRANGESPEC=xMA-NET:18000-18009 >> /opt/omni/.omnirc
for port in 18000 18001 18002 18003 18004 18005 18006 18007 18008 18009
do
  iptables -I INPUT -p tcp --dport $port -j ACCEPT
done

And if you are running the StoreOnce software component on this Linux machine, then you will need ports 9387 and 9388 (unless you have changed them).

iptables -I INPUT -p tcp --dport 9387 -j ACCEPT
iptables -I INPUT -p tcp --dport 9388 -j ACCEPT
Finally, save it for the next reboot:

service iptables save

Greg Baker is an independent consultant who happens to do a lot of work on HP DataProtector. He is the author of the only published books on HP Data Protector (http://x.ifost.org.au/dp-book). He works with HP and HP partner companies to solve the hardest big-data problems (especially around backup). See more at IFOST's DataProtector pages at http://www.ifost.org.au/dataprotector

Wednesday, 24 September 2014

Contender for the world's worst error message: Connection to CRS failed

I've just spent the afternoon chasing down this error message, trying to connect a Linux cell console client to query a Windows cell manager:

Connection to CRS failed.
To start the Data Protector daemons on the Cell Manager host use the command
omnisv -start on the Cell Manager
or check if the communication between the Cell Manager and client is encrypted with the command
omnicc -encryption -status -all on the Cell Manager.
Everything checked out on the Windows cell manager, and encrypted control was off on all nodes.

I ran strace on omnicc, omnicc -debug 1-200, tcpdump'ed every packet from the Linux system in question (there were none) and tried omnicc -server 1.2.3.4 (a non-existent address that should have been unroutable) which responded instantly.

Oh, and "omnicc -debug 1-200 2> /tmp/somewhere" doesn't put the same data into /tmp/somewhere as you get on stderr when you run "omnicc -debug 1-200" and stderr is a tty.

The root cause of the error message about being unable to connect to the server? The hostname of the Linux client didn't resolve correctly.


Greg Baker is an independent consultant who happens to do a lot of work on HP DataProtector. He is the author of the only published books on HP Data Protector (http://x.ifost.org.au/dp-book). He works with HP and HP partner companies to solve the hardest big-data problems (especially around backup). See more at IFOST's DataProtector pages at http://www.ifost.org.au/dataprotector

Disabling Data Protector encrypted control

By default, communications between a cell manager and all cell clients is not encrypted. It's just plain text, and you can intercept the network traffic and even modify it with something like netsed.

So, obviously some sites need to enable encryption. This is nicely documented in the Installation guide, and isn't that hard to figure out anyway (right click on the client and select "Enable encrypted communication").

But there is no documentation on how to disable this.

On the cell manager, there is a file /etc/opt/omni/server/config (which is usually equivalent to C:\ProgramData\OmniBack\config\server\config on Windows).

cellmgr.ifost.org.au={
encryption={
enabled=1;
certificate_chain_file='/etc/opt/omni/client/certificates/cacert.pem';
private_key_file='/etc/opt/omni/client/certificates/cacert.pem';
trusted_certificates_file='/etc/opt/omni/client/certificates/cacert.pem';
pkcs12_keystore_filename='/etc/opt/omni/client/certificates/hpdpcert.p12';
pkcs12_keystore_password='hpdpcert';
pkcs12_ca_certificate_filename='/etc/opt/omni/client/certificates/hpdpcert.p12';
pkcs12_ca_certificate_password='hpdpcert';
pkcs12_private_key_filename='/etc/opt/omni/client/certificates/hpdpcert.p12';
pkcs12_private_key_password='hpdpcert';
};
};
client.ifost.org.au={
encryption={
exception=1;
};
};

The first stanza (for cellmgr.ifost.org.au) has encryption enabled. The one client in the cell (client.ifost.org.au) does not.

In order to undo this, simply remove the content of the encryption clause.

cellmgr.ifost.org.au={
encryption={
};
};
If you restart Data Protector now (omnisv stop ; omnisv start), what will happen now is that you won't be able to connect to the cell manager using the GUI or the command line. This is the kind of error you will get:

Connection to CRS failed.
To start the Data Protector daemons on the Cell Manager host use the command
omnisv -start on the Cell Manager
or check if the communication between the Cell Manager and client is encrypted with the command
omnicc -encryption -status -all on the Cell Manager.
This is because the client-side programs think they are supposed to be encrypting their connections to the cell manager (even if we're running on the cell manager itself), and the cell manager isn't responding with a valid SSL response.

There's another file /etc/opt/omni/client/config ( C:\ProgramData\Omniback\config\client\config ) which looks somewhat similar to the server-side one:

encryption={
        enabled=1;
        certificate_chain_file='/etc/opt/omni/client/certificates/cacert.pem';
        private_key_file='/etc/opt/omni/client/certificates/cacert.pem';
        trusted_certificates_file='/etc/opt/omni/client/certificates/cacert.pem';
        pkcs12_keystore_filename='/etc/opt/omni/client/certificates/hpdpcert.p12';
        pkcs12_keystore_password='hpdpcert';
        pkcs12_ca_certificate_filename='/etc/opt/omni/client/certificates/hpdpcert.p12';
        pkcs12_ca_certificate_password='hpdpcert';
        pkcs12_private_key_filename='/etc/opt/omni/client/certificates/hpdpcert.p12';
        pkcs12_private_key_password='hpdpcert';
};

The plaintext version should look like this:

encryption={
        exception=1;
};

If you were turning off encrypted control for a client, then you will need to update the cell info file (/etc/opt/omni/cell/cell_info or C:\programdata\omniback\config\server\cell\cell_info ) and remove the reference to encryption there too.

-host "client.ifost.org.au" -os "gpl x86_64 linux-2.6.32-279.el6.x86_64" -encryption 1 -core A.09.01 -da A.09.01 -ma A.09.01  -cc A.09.01  -vepa A.09.01  -autodr A.09.01  -StoreOnceSoftware A.09.01 -ts_core A.09.01 

Greg Baker is an independent consultant who happens to do a lot of work on HP DataProtector. He is the author of the only published books on HP Data Protector (http://x.ifost.org.au/dp-book). He works with HP and HP partner companies to solve the hardest big-data problems (especially around backup). See more at IFOST's DataProtector pages at http://www.ifost.org.au/dataprotector

Tuesday, 9 September 2014

The sports wet weather line app

It's a grey Saturday morning, not quite wet enough to be sure that sport is cancelled, and not clear enough to be sure it's on.

You dial the school's wet weather line, and it's engaged because hundreds of other parents are doing the same thing. When you finally get through, you hear that sport is on, but it may get cancelled later. As you bundle the kids into the car you toss up getting a fine for using your mobile to call the wet weather line on the way, or getting there and discovering it was cancelled half an hour ago.

Sound familiar?

It was getting to a Trinity vs Cranbrook game at 7:15 in the morning (yawn!) that made me come up with a better solution.

I've set up some servers that dial the wet weather lines repeatedly and check for any change in message. If the sportsmaster has changed the voicemail message, my servers grab the new messages as an MP3 file.

Then I wrote an app (Android version here, iphone coming soon ) which connects to my servers and polls for any changes.

Just leave this app open on a Saturday morning, and you'll see when there's a new message. You can play it with the touch of a finger, or you can get the children to do it for you. They'll be using your phone in the car anyway, of course.

So far I'm only monitoring a handful of Sydney private schools: King's (TKS), Cranbrook, Trinity Prep, Trinity Grammar and Trinity Junior school. This website: http://wet-weather-line.appspot.com/ has the latest list.

If your school has a wet weather line that you dial (even if there are digits to send), my app can get updates from it. Let me know through this form if you want this done. I'm using Twilio for my phone services, so it can monitor a phone line in any country.

I've been asked to incorporate this into one school's information app (which was easily done). 

I also figured I might as well offer a voicemail service for schools and other sports organisations. If you are coaching a team and need a convenient way of getting wet weather information out, email me ([email protected]) or use this contact form.

My goal: to make sure no-one ever has to get out of bed early on a cancelled-sport Saturday morning again.

Changing the hostname of the Data Protector cell manager

HP DataProtector is very sensitive to changes in the name of the cell manager. There are several reasons for this:

  • Integration clients need to connect to the cell manager to get their credentials.
  • Clients will refuse to be upgraded from a computer that isn't their cell manager.

So if you need to change the cell manager's hostname (even if the domain name changes), there are several things you need to do.

Update the internal database

This is the easiest step of all:

  omnidbutil -change_cell_name

If you are on Windows, remember to run this in a terminal running as Administrator.

Update the cell server web service certificate (DP 8.x and later)

There will be existing certificates in /etc/opt/omni/config/server/certificates (on Linux/HP-UX) and in C:\programData\OmniBack\Config\Server\certificates on Windows. There are actually several levels of folders under this. I removed all the files I found there, leaving a skeleton of empty directories.

I don't know if this is necessary, but getting rid of the old certificates seemed like a good idea.

The new certificate will get used by the hpdp-as service which will decrypt the private key with a password. We might as well re-use the previous password to save on reconfiguring things.

The password is in /etc/opt/omni/client/components/webservice.properties on Linux/HP-UX and C:\ProgramData\OmniBack\client\components\webservice.properties on Windows (unless you chose a different data directory at install time). Yes, client, even though we're changing a server property.

Anyway, the file will look like this:

# global property file for all components
jce-serviceregistry.URL = https://hostname:7116/jce-serviceregistry/restws
keystorePath=/etc/opt/omni/server/certificates/client/client.keystore
truststorePath=/etc/opt/omni/server/certificates/client/client.truststore
keystorePassword=zys124ax52353
truststorePassword=zys124ax52353

Now we need to generate a new certificate:

omnigencert.pl -server_id the-new-cell-hostname -store_password xyz124ax52353 -user_id hpdp

On Windows, the .pl ending might not be associated with the right Perl interpreter (or even any interpreter at all). If so, give the full pathname for Perl and omnigencert.pl like this:
  "C:\Program Files\OmniBack\bin\perl.exe" "C:\Program Files\OmniBack\bin\omnigencert.pl"


Clients

Each client will now have the wrong cell server name. On Unix or Linux boxes, there's a file /etc/opt/omni/client/cell_server which is a text file just listing the name of the cell manager. Use whatever technique you normally push config files out with (puppet? cfengine? scp?) to update this.

Windows boxes keep the cell server in a registry key. Update this:

HKEY_LOCAL_MACHINE\SOFTWARE\Hewlett-Packard\OpenView\OmniBackII\Site\CellServer


Server cell info

The server itself is a client, and presumably the information in the cell_info file (/etc/opt/omni/server/cell/cell_info or C:\ProgramData\Omniback\config\server\cell\cell_info) will need to be updated.

If there were any devices attached to the cell manager, then they will need to be updated. You can do this with omnidownload / omniupload, or just by going into each device in the GUI and applying the relevant changes.

Advanced scheduling webservices (DP 8.1 and later)

Under /etc/opt/omni/client/components (C:\ProgramData\Omniback\config\client\components), there are several webservice.properties files scattered through the filesystem. They register web service components.

For example,  here is dp-jobexecution-backup/webservice.properties:

# dp-jobexecutionengine-backup property file
dp-jobexecutionengine-backup.URL = https://hostname:7116/dp-jobexecutionengine-backup/restws
jce-serviceregistry.URL = https://hostname:7116/jce-serviceregistry/restws

These all have to be updated:

  • webservice.properties
  • dp-jobexecutionengine-backup/webservice.properties
  • dp-jobexecutionengine-consolidation/webservice.properties
  • dp-jobexecutionengine-copy/webservice.properties
  • dp-jobexecutionengine-verification/webservice.properties
  • dp-loginprovider/webservice.properties
  • dp-scheduler-gui/webservice.properties
  • dp-webservice-server/webservice.properties
  • jce-dispatcher/webservice.properties
  • jce-serviceregistry/webservice.properties

Then restart DataProtector (omnisv stop ; omnisv start)

Licenses

Data Protector licenses are tied to the hostname and IP address of the cell manager. So you will need to visit http://webware.hp.com/ to get a new set of license keys for your cell manager.

Incidentally, if you are moving or renaming a cell manager, you might find it helpful to read this book: Migrating and Cloning Data Protector cell managers


Greg Baker is an independent consultant who happens to do a lot of work on HP DataProtector. He is the author of the only published books on HP Data Protector (http://www.ifost.org.au/books/#dp). He works with HP and HP partner companies to solve the hardest big-data problems (especially around backup). See more at IFOST's DataProtector pages at http://www.ifost.org.au/dataprotector, or visit the online store for Data Protector products, licenses and renewals at http://store.data-protector.net/ 

Friday, 29 August 2014

Details about the schema of the Data Protector internal database


As far as I can tell, HP don't document the schema of the PostgreSQL internal database. What follows is my investigations as I was chasing up a customer for whom thousands of sessions were giving strange results. (When they looked at a medium and went to the Objects tab to find, the GUI responded with "in order to delete this medium, export it first".)

There are seven important database tables which are affected in the normal operations of running a backup.

The first one is dp_management_session.
  • It has a column "name" which looks like this: "2014/07/14 0007" - in other words, the session name as it appears in every command, except that it has a space instead of a dash.
The name is used as a unique key (as it should be unique!) in conjunction with the column 'application_uuid'. I haven't figured out exactly what that's doing, but I'm presuming it's something to do with the manager-of-managers product, where you might have centralised all your media into one cell. In this case you could have two or identical session names referred to in the database, one from each of the client cells and one from the manager cell. To simplify things, I've ignored the application_uuid column(s) in the diagram.

Obviously, the backup should write something. If this is the first time this filesystem (or database or Exchange server, etc.) has been backed up, then the dp_objects table will have a new row added to it, with the hostname, mountpoint and label of the object being backed up. The columns uuid and dp_numkey act like the primary key for this table, which means that if you aren't running manager-of-managers, the dp_numkey will be unique.

Each time a backup of that filesystem runs, a row is added to dp_catalog_object_version. If there are several filesystems being backed up in one job, this table may have many rows added for each backup run.
  • There is a column backup_name which partly references dp_management_session.name. I say partly, because there is no foreign key between them, and in fact, sometimes backup_name is null. Presumably what's going on is that a backup could have a copy made, and then the original expires, delete the original session, leaving a catalog object version which doesn't correspond to a session.
  • The column object_seq_id references dp_objects (together with the usual uuid story).
  • The primary key is the combination of application_uuid (as usual) and a field called seq_id.

There is a row created in db_catalog_object_datastream and also one in dp_catalog_object_versession for each row added to dp_catalog_object_version. These don't seem very interesting: the former looks like it's something to do with enforcing device policies, and the latter a record of a post-backup verification.

The oddly and painfully named dp_catalog_position_seqacc_med maps backup objects to positions on tapes. This is obviously a very large table!

  • The column objver_seq_id references the dp_catalog_object_version's seq_id column, essentially "what is backed up here?"
  • The column medium_name references the unique header ID of the tape, for example '7b5ba8c0:53c3ae35:07eb:0014'

There is another table called dp_positions which is a little bit more accessible, but inserts, updates and deletions from this table trigger a function instead (presumably to update dp_catalog_position_seqacc_med). In a few tests this table got populated and dp_positions did not.

If you are backing up to a StoreOnce device, or to a file library then there's a good chance that this backup will cause a new medium to be created. This will also happen when you format a new tape in either a physical tape library of a virtual tape library.

The tapes are all listed in dp_medmng_media_pool_media. The column medium_seq_id (which is not the medium header, it's just an ID) is the key into the dp_medmng_medium tape.

There seems to be a distinction made between the medium itself, and the cartridge holding it. The dp_medmng_cartridge table has a barcode, a physical_location and a contained medium_seq_id. The dp_medmng_medium table has a unique seq_id and a name. The name is the header on the tape.

I still haven't figured out the way that file names are stored. Presumably this is in DCBF files until you run omnimigrate. In this transition time (which seems to be the default on new installs as well), some records get written to dp_catalog_dcbf_directory and dp_catalog_dcbf_info.

Greg Baker is an independent consultant who happens to do a lot of work on HP DataProtector. He is the author of the only published book on HP Data Protector (http://x.ifost.org.au/dp-book). He works with HP and HP partner companies to solve the hardest big-data problems (especially around backup). See more at IFOST's DataProtector pages at http://www.ifost.org.au/dataprotector






Friday, 15 August 2014

Upgrading from 6.2 to 9.x is better, but there's a bug you need to work around

Upgrading to HP Data Protector version 9 is much faster than upgrading to version 8.x. It's amazing what some good indexes on a database can do! My most recent customer had a 50GB database and the upgrade for the cell manager was complete in under an hour.

I have found one interesting bug in the import process, though. It seems that one of the columns in the import gets mis-imported. After the upgrade, I found sessions like this (truncated output from omnidb -session -detail) :

SessionID : 2014/08/09-4
        Backup Specification: [email protected]
        Session type        : Copy
        Started             : Saturday, 9 August 2014, 7:06:02 PM
        Finished            : Monday, 11 August 2014, 5:13:37 PM
        Status              : Completed
        Number of warnings  : 0
        Number of errors    : 0
        User                :
        Group               :
        Host                :  

When previously, the session looked like this:

SessionID : 2014/08/09-4
        Backup Specification: Weekly Tape Copy
        Session type        : Copy
        Started             : Saturday, 9 August 2014, 7:06:02 PM
        Finished            : Monday, 11 August 2014, 5:13:37 PM
        Status              : Completed
        Number of warnings  : 0
        Number of errors    : 0
        User                : hpdp
        Group               : IFOST
        Host                : cellmgr.ifost.org.au

The user / group / host has been turned into the specification name! As far as I can tell, this is just a once-off, sessions run on the cell manager after the upgrade are named correctly.

This bug appears to be triggered in two situations:
  1. For all copy jobs.
  2. For backup jobs where the backup specification no longer exists. (e.g. you used to have a job called "Daily Sydney"; you ran it every day last year; then you deleted it; then you upgraded - congratulations, those jobs will now have a username instead of a backup specification name. 
So presumably somewhere in the upgrade script there is some code which calls out to omnidb -datalist "..." because this is the only thing I can think of which would exhibit exactly these kind of failure modes.

When you actually look into the database, there's a column on the dp_management_session table called "scratch_area" and a column called "owner". It seems that for the bad sessions, these end up with owner being the empty string, and scratch_area ending in two slashes.

The way to confirm this is to check with a SQL query like this:

     select owner,name,datalist,scratch_area 
       from dp_management_session where owner = '';

Just save this to a file and run omnidbutil -run_script filename.sql -detail
For me, the output looked like this:


 owner |      name       |           datalist              |   scratch_area
-------+-----------------+---------------------------------+--------------------
       | 2014/08/09 0010 | [email protected] | Daily job //
       | 2013/04/29 0016 | [email protected] | Weekly tape copy //
       | 2014/08/09 0004 | [email protected] | Daily job //
....

There were 111 other lines truncated. Here's what I ran (with omnidbutil -run_script) to fix it:

     update dp_management_session 
        set owner = datalist where owner = '';

     update dp_management_session 
        set datalist = trim(trailing ' // ' from scratch_area) 
        where scratch_area like '%//%';

The GUI console caches, so don't be surprised if you have to disconnect and reconnect before you see it reflected in the GUI.

Greg Baker is an independent consultant who happens to do a lot of work on HP DataProtector. He is the author of the only published books on HP Data Protector (http://x.ifost.org.au/dp-book). He works with HP and HP partner companies to solve the hardest big-data problems (especially around backup). See more at IFOST's DataProtector pages at http://www.ifost.org.au/dataprotector