Search This Blog

Monday, 9 November 2015

Checking StoreOnce stores on Windows

In Data Protector 9.04, I've encountered a problem occasionally where the StoreOnce software store on Windows is completely unresponsive.

The error message that you will see in the session log is unpredictable, but it will often look something like this:

[Major] From: [email protected] "IFOST backup"  Time: 9/11/2015 1:18:44 PM
[61:3003]      Lost connection to B2D gateway named "DataCentrePrimary"
    on host storeonce.ifost.org.au.
    Ipc subsystem reports: "IPC Read Error
    System error: [10054] Connection reset by peer
"

One of the ways of detecting the problem was that the command "StoreOnceSoftware --list_stores" would hang.


I created the following three batch files and scheduled CheckStoreOnceStatus.cmd to run once per hour:

CheckStoreOnceStatus.cmd
start /b CheckStoreOnceStatusController.cmd
start /b CheckStoreOnceStatusChild.cmd
waitfor /t 600 fiveminutes
exit /b
CheckStoreOnceStatusChild.cmd
StoreOnceSoftware --list_stores
WAITFOR /SI StoreOnceOK

CheckStoreOnceStatusController.cmd
WAITFOR /T 30 StoreOnceOK && (
  REM StoreOnce OK
  exit /b
)
REM StoreOnce failure
net stop StoreOnceSoftware
waitfor /t 120 GiveItTime
net start StoreOnceSoftware
exit /b


Actually, I also added a call out to blat to send an email after the net start command.

So, CheckStoreOnceStatus spawns off *Controller, which will wait for 30 seconds for a signal to arrive from *Child as soon as child has been able finish StoreOnceSoftware --list_stores.

Greg Baker is an independent consultant who happens to do a lot of work on HP DataProtector. He is the author of the only published books on HP Data Protector (http://www.ifost.org.au/books/#dp). He works with HP and HP partner companies to solve the hardest big-data problems (especially around backup). See more at IFOST's DataProtector pages at http://www.ifost.org.au/dataprotector