Search This Blog

Friday, 29 August 2014

Details about the schema of the Data Protector internal database


As far as I can tell, HP don't document the schema of the PostgreSQL internal database. What follows is my investigations as I was chasing up a customer for whom thousands of sessions were giving strange results. (When they looked at a medium and went to the Objects tab to find, the GUI responded with "in order to delete this medium, export it first".)

There are seven important database tables which are affected in the normal operations of running a backup.

The first one is dp_management_session.
  • It has a column "name" which looks like this: "2014/07/14 0007" - in other words, the session name as it appears in every command, except that it has a space instead of a dash.
The name is used as a unique key (as it should be unique!) in conjunction with the column 'application_uuid'. I haven't figured out exactly what that's doing, but I'm presuming it's something to do with the manager-of-managers product, where you might have centralised all your media into one cell. In this case you could have two or identical session names referred to in the database, one from each of the client cells and one from the manager cell. To simplify things, I've ignored the application_uuid column(s) in the diagram.

Obviously, the backup should write something. If this is the first time this filesystem (or database or Exchange server, etc.) has been backed up, then the dp_objects table will have a new row added to it, with the hostname, mountpoint and label of the object being backed up. The columns uuid and dp_numkey act like the primary key for this table, which means that if you aren't running manager-of-managers, the dp_numkey will be unique.

Each time a backup of that filesystem runs, a row is added to dp_catalog_object_version. If there are several filesystems being backed up in one job, this table may have many rows added for each backup run.
  • There is a column backup_name which partly references dp_management_session.name. I say partly, because there is no foreign key between them, and in fact, sometimes backup_name is null. Presumably what's going on is that a backup could have a copy made, and then the original expires, delete the original session, leaving a catalog object version which doesn't correspond to a session.
  • The column object_seq_id references dp_objects (together with the usual uuid story).
  • The primary key is the combination of application_uuid (as usual) and a field called seq_id.

There is a row created in db_catalog_object_datastream and also one in dp_catalog_object_versession for each row added to dp_catalog_object_version. These don't seem very interesting: the former looks like it's something to do with enforcing device policies, and the latter a record of a post-backup verification.

The oddly and painfully named dp_catalog_position_seqacc_med maps backup objects to positions on tapes. This is obviously a very large table!

  • The column objver_seq_id references the dp_catalog_object_version's seq_id column, essentially "what is backed up here?"
  • The column medium_name references the unique header ID of the tape, for example '7b5ba8c0:53c3ae35:07eb:0014'

There is another table called dp_positions which is a little bit more accessible, but inserts, updates and deletions from this table trigger a function instead (presumably to update dp_catalog_position_seqacc_med). In a few tests this table got populated and dp_positions did not.

If you are backing up to a StoreOnce device, or to a file library then there's a good chance that this backup will cause a new medium to be created. This will also happen when you format a new tape in either a physical tape library of a virtual tape library.

The tapes are all listed in dp_medmng_media_pool_media. The column medium_seq_id (which is not the medium header, it's just an ID) is the key into the dp_medmng_medium tape.

There seems to be a distinction made between the medium itself, and the cartridge holding it. The dp_medmng_cartridge table has a barcode, a physical_location and a contained medium_seq_id. The dp_medmng_medium table has a unique seq_id and a name. The name is the header on the tape.

I still haven't figured out the way that file names are stored. Presumably this is in DCBF files until you run omnimigrate. In this transition time (which seems to be the default on new installs as well), some records get written to dp_catalog_dcbf_directory and dp_catalog_dcbf_info.

Greg Baker is an independent consultant who happens to do a lot of work on HP DataProtector. He is the author of the only published book on HP Data Protector (http://x.ifost.org.au/dp-book). He works with HP and HP partner companies to solve the hardest big-data problems (especially around backup). See more at IFOST's DataProtector pages at http://www.ifost.org.au/dataprotector