IFOST Blog: Why upgrading from version 6 and 7 can be a little slow

Tuesday, 15 July 2014

Why upgrading from version 6 and 7 can be a little slow

I'm writing this while waiting for an in-place upgrade from Data Protector 7.0 to 8.1 to complete. Several hours later, I started wondering what it was doing. I could have just connected to the database using the run_script option to omnidbutil, but I thought I'd explore a bit further.

The connection details are in C:\ProgramData\Omniback\Config\Server\idb\idb.config (or whereever you have put the database; it's /etc/opt/omni/server/idb/idb.config on Linux). You will see a line like this:

PGPASSWORD='ZmxxeW5paGn0ZmovYQ==';

Copy and paste the part between the quotes into your favourite Base64 decoder to get the plain text password returned. You can then run:

psql --port=7112 -U hpdpidb_app -d hpdpidb

(psql is in "C:\Program Files\OmniBack\idb\bin" on Windows, /opt/omni/idb/bin on Linux)

PostgreSQL lets you see what queries are running -- unlike Raima Velocis, where this would have been very hard to figure out!

From the psql prompt:

hpdpidb=> select * from pg_stat_activity;
datid | datname | procpid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | xact_start | query_start | waiting |
current_query
-------+---------+---------+----------+-------------+------------------+-------------+-----------------+-------------+--
--------------------------+----------------------------+----------------------------+---------+-------------------------
------------------------------------------------
16388 | hpdpidb | 3608 | 16384 | hpdpidb_app | | 127.0.0.1 | | 49842 | 2
014-07-15 11:01:13.695+10 | 2014-07-15 11:23:58.31+10 | 2014-07-15 11:23:58.31+10 | f | DELETE FROM dp_catalog_position_seqacc_med \r +
| | | | | | | | |
| | | | WHERE medium_name NOT IN (SELECT medium_id FROM dp_medium_to_objects);
16388 | hpdpidb | 3780 | 16384 | hpdpidb_app | psql | ::1 | | 60664 | 2
014-07-15 15:10:49.911+10 | 2014-07-15 15:11:14.763+10 | 2014-07-15 15:11:14.763+10 | f | select * from pg_stat_activity;
(2 rows)

That's rather interesting: I'm waiting for it to run
DELETE FROM dp_catalog_position_seqacc_med
WHERE medium_name not in (select medium_id FROM dp_medium_to_objects);

So it's some kind of sanity clean-up to maintain referential integrity, perhaps?

Why is that taking so long, though? Time to explain the query:

hpdpidb=> explain delete from dp_catalog_position_seqacc_med where medium_name not in (select medium_id from dp_medium_to_objects);
QUERY PLAN

------------------------------------------------------------------------------------------------------------------------
----------------------------------
Delete on dp_catalog_position_seqacc_med (cost=1343236.10..514911431610.60 rows=2409753 width=6)
-> Seq Scan on dp_catalog_position_seqacc_med (cost=1343236.10..514911431610.60 rows=2409753 width=6)
Filter: (NOT (SubPlan 1))
SubPlan 1
-> Materialize (cost=1343236.10..1544864.81 rows=4819506 width=28)
-> Subquery Scan on dp_medium_to_objects (cost=1343236.10..1487821.28 rows=4819506 width=28)
-> Unique (cost=1343236.10..1439626.22 rows=4819506 width=131)
-> Sort (cost=1343236.10..1355284.87 rows=4819506 width=131)
Sort Key: p.medium_name, obj.uuid, obj.seq_id, obj.legacy_type_id, app.name, obj.name, obj.description
-> Hash Join (cost=107153.79..478801.50 rows=4819506 width=131)
Hash Cond: ((p.application_uuid = ovr.application_uuid) AND (p.objver_seq_id = ovr.seq_id))
-> Seq Scan on dp_catalog_position_seqacc_med p (cost=0.00..139912.06 rows=48 19506 width=50)
-> Hash (cost=76201.39..76201.39 rows=922427 width=125)
-> Hash Join (cost=325.37..76201.39 rows=922427 width=125)
Hash Cond: ((ovr.object_uuid = obj.uuid) AND (ovr.object_seq_id = obj.seq_id))
-> Seq Scan on dp_catalog_object_version ovr (cost=0.00..50509.27 rows=922427 width=43)
-> Hash (cost=243.97..243.97 rows=5427 width=103)
-> Hash Join (cost=2.72..243.97 rows=5427 width=103)
Hash Cond: ((obj.application_uuid = app.uuid) AND (obj.application_seq_id = app.seq_id))
-> Seq Scan on dp_catalog_object obj (cost=0.00..146.27 rows=5427 width=93)
-> Hash (cost=1.69..1.69 rows=69 width=52)
-> Seq Scan on dp_frontend_application app (cost=0.00..1.69 rows=69 width=52)
(22 rows)

So that's 922427 * 5427 * 69 = 345,414,781,701 row operations at an absolute bare minimum.

I think I might call it a day and come back in a few days' time. Or maybe a few weeks' time.

Greg Baker is an independent consultant who happens to do a lot of work on HP DataProtector. He is the author of the only published book on HP Data Protector (http://x.ifost.org.au/dp-book). He works with HP and HP partner companies to solve the hardest big-data problems (especially around backup). See more at IFOST's DataProtector pages at http://www.ifost.org.au/dataprotector

3 comments:

Unknown23 July 2014 at 21:09
so your database will be *very* clean by the time it finishes, but then it will be time to upgrade to the next version!
ReplyDelete
Replies
Unknown3 July 2020 at 09:29
Hi Greg,

Amazing post, hope you tried version 9. I couldn't connect to the idb with the above command. Please advise how I can connect to the IDB, should I change any params as per my env?

Really appreciate the work!

Thanks
ReplyDelete
Replies

Add comment

Other related sites

I lecture in natural language processing, and research non-manifold machine learning. I consult to companies needing help with technology management, AI strategy.
I have built a prediction market platform for enterprises to help with employee engagement and strategic decision making.
I wrote this NPS survey analyzer
As a build-a-business-in-one-day exercise, I have a GPT-via-email bot.
Conference call system to help when you need simultaneous translation done, but the only equipment you have on hand are people's mobile phones... which aren't all smartphones: church-translation.com
My Amazon author page
In the past I used to answer a lot of questions on Quora
Early 21st century pre-singularity geek-nerd poetry also available in book form

Search This Blog

Tuesday, 15 July 2014

Why upgrading from version 6 and 7 can be a little slow

3 comments: