Search This Blog

Wednesday 7 December 2016

Artificial Intelligence (#AI) development in Sydney around #Atlassian and #JIRA

Well, it's boastful to say it, but we just received a nice little certificate from Craig Laundy (assistant federal minister for innovation) and Victor Dominello (NSW state minister for innovation). It says "Best Industry Application of AI/Cognitive" for the Automated Estimator of Effort and Duration.

Actually, we were just highly commended, and it was stratejos that won, but what is interesting about all this is: the whole AI/Cognitive category just went to artificial intelligence JIRA plug-ins.

Firstly, Atlassian has won the enterprise. When 100% of the top-tier startups targeting large organisations are developing for your platform exclusively, it's only a matter of time.

Secondly, AI is hot technology in Sydney with political capital. We think of Silicon Valley as being the centre of the universe for this, but I've never seen US Federal and Californian state senators getting together to express their commitment as we saw in this award.

Thirdly, this means you really should try out AEED:

Thursday 29 September 2016

DataProtector technical workshop in New Zealand

Email if you are interested. It's on next week.

Date: Wednesday, 5 October 2016
Time: 8.30am to 1.00pm

Location:  Hewlett Packard Enterprise Office
Level 4, 22 Viaduct Harbour Avenue, Auckland 1011

Brandon and Paul Carapetis will be talking about:
  • New features from the latest versions of Data Protector
  • Integration with VMware, Hyper-V, and HPE hardware: 3PAR and StoreOnce
  • Road map for the year ahead
  • Introduction to New Related Products: Backup Navigator, Connected, Storage Optimizer, and VM Explorer

Thursday 22 September 2016

Really proud of my students -- machine learning on corporate websites

Amanda wanted to know what factors influence certain web sites that people visit. But her exploratory analysis found out the secrets of big software companies.

She used a sitemap crawler to look at the documentation of the usual names: Microsoft, Symantec, Dropbox, Intuit, Atlassian, CA, Trello, Github, Adobe, Autodesk, Oracle and some others. She looked at the number of words on each page, the number of links and various other measures and then clustered the results.

Like the best of all data science projects, the results are obvious, but only in retrospect.

Microsoft, Symantec and Dropbox are all companies whose primary focus is on serving non-technical end-users who aren’t particularly interested in IT or computers. They clustered into a group with similar kinds of documentation.

CA, Trello and Github primarily focus on technical end-users: programmers, sysadmins, software project managers. Their documentation cluster together in similarity. Intuit and Atlassian were similar; Adobe and Oracle clustered together.

Really interestingly, it’s possible to derive measures of the structural complexity of the company. Microsoft is a large organisation with silos inside silos. It can take an enormous number of clicks to get from the default documentation landing page until you get to a “typical” target page. Atlassian prides itself on its tight teamwork and its ability to bring people together from all parts of the organisation. They had the shortest path of any documentation site.

But this wasn’t what Amanda was really after: she wanted to know whether she could predict engagement: whether people would read a page on a documentation site or just skim over it. She took each website that she could get data for and deduced how long it would take to read the page (based on the number of words), how long people were actually spending on the page, the number of in-links and a few other useful categories (e.g. what sort of document it was). She created a decision tree model and was able to explain 86% of the variance in engagement.

Interesting result: there was little relationship between the number of hyperlinks that linked to a site and how much traffic it received. Since the number of links strongly influence a site’s pagerank in Google’s search algorithms, this is deeply surprising.

There was more to her project (some of which can’t be shared because it is company confidential), but just taking what I’ve described above, there are numerous useful applications:
  • Do you need to analyse your competition’s internal organisational structure? Or see how much has changed in your organisation in the months after an internal reorg?
  • Is your company’s website odd compared to other websites in your industry?
  • We can use Google Analytics and see what pages people spend time on, and which links they click on, but do you want to know why they are doing that? You know the search terms your visitors used, but what is it that they are interested in finding?

Wednesday 14 September 2016

Really proud of my students - AI analysis of reviews

Sam wants to know what movies are worth watching, so he analysed 25,000 movie reviews. This is a tough natural language processing problem, because each movie only has a small number of reviews (less than 30). It’s nowhere near enough for a deep learning approach to work, so he had to identify and synthesise features himself.

He used BeautifulSoup to pull out some of the HTML structure from the reviews, and then made extensive use of the Python NTLK library.

The bag-of-words model (ignoring grammar, structure or position) worked reasonably well. A naive Bayesian model performed quite well -- as would be expected -- as did a decision tree model, but there was enough noise that a logistic regression won out, getting the review sentiment right 85% of the time. He evaluated all of his models with F1, AUC and precision-recall. He used this to tweak the model a little and just nudge it a little higher.

A logistic regression over a bag-of-words essentially means that there we are assigning a score to each word in the English language (which might be a positive number, a negative number or even zero), and then adding up the scores for each word when it appears. If overall it adds up to a positive number, we count the review positive; if negative the reviewer didn’t like the movie.

He used the Python Scikit learn library (as do most of my students) to calculate the optimal score to assign to each English language word. Since the vocabulary he was working with was around 75,000 words (he didn’t do any stemming or synonym-based simplication) this ran for around 2 days on his laptop before coming up with an answer.

Interestingly, the word “good” is useless as a predictor of whether a movie was good or not! It probably needs more investigation, but perhaps a smarter word grouping that picked up “not good” would help. Or maybe it fails to predict much because of reviews that say things like “while the acting was good, the plot was terrible”.

Sam found plenty of other words that weren’t very good predictors: movie, film, like, just and really. So he turned these into stopwords.

There are other natural language processing techniques that often produce good results, like simply measuring the length of the review, or measuring the lexical dispersion (the richness of vocabulary used). However, these were also ineffective.

What Sam found was a selection of words that, if they are present in a review, indicate that the movie was good. These are “excellent”, “perfect”, “superb”, “funniest” and interestingly: “refreshing”. And conversely, give a movie a miss if people are talking about “worst”, “waste”, “disappointment” and “disappointing”.

What else could this kind of analysis be applied to?
  • Do you want to know whether customers will return happier, or go elsewhere looking for something better? This is the kind of analysis that you can apply to your communications from customers (email, phone conversations, twitter comments) if you have sales information in your database.
  • Do you want to know what aspects of your products your customers value? If you can get them to write reviews of your products, you can do this kind of natural language processing on them and you will see what your customers talk about when they like your products.

Friday 9 September 2016

Really proud of my students -- data science for cats

Ngaire had the opportunity to put a cat picture into a data science project legitimately, which could be a worthy blog post in itself. She did an analysis on how well animal shelters (e.g. the council pound) are able to place animals.

She had two data sources. The first was Holroyd council’s annual report (showing that the pound there euthanised 59% of animals they received), but of course any animal that had been microchipped would have been otherwise handled, so the overall percentage is much lower than this in reality. Still, Australia is far behind the USA (based on her second data source, which was a Kaggle-supplied dataset of ~25,000 outcomes).

She put together a decision tree regressor which correctly predicted what would happen to an animal in a shelter around 80% of the time. She also had a logistic regression model with a similar success rate.

The key factors which determined the fate of a lost kitten or puppy were a little surprising. Desexed animals do better in the USA. The very young (a few weeks) do have very good outcomes -- they are very likely to be adopted or transferred. Black fur was a bad sign, although the messiness of the data meant that she couldn’t explore colours all that much further: at a guess, she suggested that being hard to photograph is a problem, so perhaps there is a cut-off level of darkness where there will be a sudden drop in survival rates.

Where else could this kind of analysis be applied? She could take her model and apply it when you want to know an outcome in advance. Questions might include:
How will a trial subscription work out for a potential customer?
What are going to happen to our stock? Will the goods be sold, used, sent to another store…?

Ngaire is open to job opportunities, particularly if you are looking for a data scientist with a very broad range of other career experience in the arts; I can put you in touch if you are interested.

Sunday 28 August 2016

Really proud of my students -- final projects

Over the next few weeks I'm going to do some short blog posts about each of the final projects my students did in their data science course.

One of the reasons this blog has been a bit quieter than usual these last few months is that I was teaching a Data Science class at General Assembly, which was rewarding but rather exhausting.

Some observations:
  • GA is busy and dynamic. I remember back in the late 1990s at HP when every company was deploying SAP on HP-UX to avoid Y2K problems: there were classes constantly; you might discover that the class you were teaching was going to be held in the boardroom using some workstations borrowed from another city. GA was like that: every room packed from early morning until late at night.
  • No-one in the class had a job as a data scientist at the beginning of the course, but there was a lot of movement within 10 weeks: job changes, promotions, new career directions. The only time in my teaching career where I saw the same wow-this-person-is-trained-now-let's-poach-them was in the early days of the Peregrine -> Service Manager transition.
  • The course is mainly about machine learning but there is flexibility for the instructor to add in a few other relevant topics based on what the students want. Right now, Natural Language Processing is white-hot. Several students did some serious NLP / NLU projects. The opportunities for people who have skills in this area are very, very good.
  • Computer vision is an area where there is a lot of interest as well.
I'll be teaching the first part of the Data Science immersive (a full-time course instead of a night-time part-time one) starting in September; please sign up with GA if you are interested.

I suspect by the time I've finished blogging about my past students' projects that there will be a new round of student projects to cover, so this might become a bit more of a feature on my blog.

Tuesday 23 August 2016

Automate the installation of a Windows DataProtector client

A client today wanted to push the DataProtector agent from SCCM / System Center 2012 instead of from Data Protector. It's not that difficult, but I couldn't find the command-line setup documented anywhere.

You will need to run (as an administrator):

  net use r: \\\Omniback
  cd \x8664
  msiexec /i "Data Protector A.09.00.msi" /passive INSTALLATIONTYPE=Client ADDLOCAL=core,da,autodr
  net use /delete r:

Obviously, substitute with your install server, and if R: is already allocated, use something else instead.

Then, trigger the following command on your cell manager:

 omnicc -import_host clientname

Replace clientname with the name of the client.

Script this as appropriate (e.g. after the operating system has booted) in order to have an unattended installation.

Friday 5 August 2016

Data Protector CRS operation cannot be performed in full-screen mode

Today's head-scratcher: after upgrading to 9.07 on a Windows cell manager, the CRS service won't start.

Eventvwr says something even weirder:

The Data Protector CRS service terminated with service-specific error The requested operation cannot be performed in full-screen mode.. 

I was in full screen mode at the time, but it still wouldn't start even when I minimised my RDP session. For my own sanity, I was glad of this.

Trawling through Daniel Braun's blog, I saw some comments there that it could be related to anti-virus software. Nope, not that either.

The debug.log said something a little bit more believable:

[SmCreateTable] MapViewOfFile(size:17505216) failed, error=[5] Access is denied.

I discovered that I could reliably get that message added that message every time I tried to start the CRS. But what is actually being denied?

So I ran omnisv start -debug 1-500 crm-vexatious.txt

I then had a 160KB file created in C:\programdata\omniback\tmp that began with OB2DBG, ended with crm-vexatious.txt and had CRM in the filename. Good: at least it gets far enough that it can create debug messages.

Scrollling right to the bottom of it, there it was:

Code is:1007  SystemErr: [5] Access is denied
************************   DEFAULT ERROR REPORT   ***************
[Critical] From "" Time: 5/8/2016 1:00:33PM
Unable to allocate shared memory: Unknown internal error.

Internally, the function to return a shared memory segement presumably encodes something as 1007; CRS then exits with that code (which is the standard Windows error code for "can't be performed in full-screen mode").

There aren't many reasons for a shared memory allocation to fail. In fact, the only one I can think of that could be relevant here is if the segment already exists. I thought about figuring out what the equivalent to ipcrm is on Windows, gave up and rebooted the box.

And it came up perfectly. Funnily enough, if I had had no idea what I was doing, I would have just bounced the box to see if it would have fixed it, and saved myself a headache and some stress wondering what was going on. Ignorance would have been bliss.

Saturday 9 July 2016

[Politics] The Rise of the Technologist Parties

What’s the most important resource? What is it, that if you control it, gives you power?
Here are four of the most common answers you will hear:
  • The most important resource is land. Without land we have no food (or anything else for that matter).
  • The most important resource is the labour of massed workers. Without anyone to do the work, nothing will get done and we will have nothing.
  • The most important resource is the environment. Without air to breathe or water to drink, there is no economy.
  • The most important resource is the capital that dictates what gets done. Money is power: we should try to remove aberrations that send capital into unnecessary and pointless directions.
Most people can align with one of these viewpoints. In fact, in Australia these viewpoints are so strongly held that we even have political parties to represent those who hold those views (in order: the Nationals, Labor, the Greens, the Liberals).
As far as I can tell, in the USA the middle two align with the Democrats and the outer two with the Republican party. In some states in Australia, a similar merge has happened with the Nationals and Liberal party merging.
We look at these answers as if they have been around forever and that there can be no other significant factor, ignoring the fact that “the labour of workers” as a significant asset was a rarely-expressed thought prior to 1850, nor was there much coherency to the green movement before “Silent Spring” in the 1960s.
But something has just changed. We’re seeing it first in Australia because of our preferential voting and large numbers of micro-parties. In this week’s elections, the vote for “other” parties grew. About 1 in every 4 Australians did not vote for any of the major parties, but instead voted for one of about 50 “other” parties.
There’s a good chance that “other” parties will end up holding the balance of power in the lower house and even with desperate changes to the voting rules enacted by the previous parliament, there are likely to be numerous “other” parties in the upper house.
And I think it’s an acknowledgement that there are other answers to my first question.
Let me add three answers, which statistically (according to the election results) must be viewpoints held by at least 250,000 adults in Australia:
  • The most important resource in a society is the quality and depth of the religious faith of the members of that society.
  • There is no important resource that is worth getting worked up about. Let’s all have sex and smoke dope.
  • The most important input in the 21st century is the accessible and useable corpus of science, technology and engineering.
The religious faith answer is interesting in itself and maybe one day I’ll write an article on it. There are lots of different threads to that one.
The Sex Party and HEMP party alliance together polled nearly as well as the Christian parties. If we add in the Drug Reform party, they were well ahead. I’m not sure what to make of this. Does this show that we are a very mature country, well up on Maslow’s hierarchy of needs, or does it show the opposite?
But for now, let’s look at what happened with science, technology and engineering.
Have a look through the candidates for the Science party (which surely is the party closest to this last answer), and you will find a list of bright folks: a PhD in biochemistry here; a technology startup founder there; a professional scientist.
It’s a funny co-incidence, but those are all job titles of future equity lords. If you are a wealthy founder of a high-tech company, you probably at one point had a job title like one of those.
Let’s rewind. If you wanted to get wealthy twenty years ago, you went into finance, did some deals, took a cut and everyone came out smiling because there was always margin to be made. Play things right and you could make a few million just getting the right people and the right money together. You really only needed some capital.
If you wanted to get wealthy fifty years ago, you would have started a factory, employed lots of workers to churn out goods and made a profitable living on the marginal value that each worker could produce. You needed a supply of trainable workers and a bit of capital to get going.
If you wanted to get wealthy one hundred years ago, you needed to own land. The more of it you had, the more you could grow on that land. Come harvest time you would employ as much temporary labour as you could acquire and sold the goods produced. You needed land, a supply of semi-trained workers and a lot of capital.
Today, if you want to become wealthy, you need a skillset that lets youautomate somethingso that you can leverage your own brainpower to do the same work as a hundred people without that skill set. Here are some examples:
  • A biotech startup that works out how to get bacteria to synthesise some useful industrial chemical.
  • The machine learning / artificial intelligence startup that works out how to automate a white collar (or blue collar) job.
  • The medtech startup that has some new process for treating or identifying a disease.
  • The software company that creates a viral product that everybody wants.
You need very modest amounts of capital (the most expensive of these would probably be the medtech startup which would probably need to raise $5m). Since any of these occupation titles (computer scientist, biotech developer, medical device engineer) can generate very valuable intellectual property in a very short time, there’s a good chance that you would maintain a significant equity stake in your business after all the capital raising has been done and the company that you form goes on to become worth tens or hundreds of millions of dollars. There’s a name for the people whose lives have these trajectories: “equity lords”.
Emphatically, to get there, equity lords don’t need:
  • A large workforce of unskilled or semi-skilled labour. Unlike manufacturing, doubling revenue does not require doubling the workforce. So right-wing parties trying to rail on the side of big business against labour unions are unlikely to be saying anything of importance. Let’s just put this into perspective: I overheard a salary negotiation for a potential new employee the other day. The employer had offered $140-$150k plus equity. The candidate replied that this was way too low, doubled the equity component and asked for $20k extra. The employer happily agreed saying “well if it’s only an extra $2,000 per month…” This company has less than 5 employees, but would have wages approaching $1m / year.
  • Hundreds of millions of dollars of capital. We have moved into a world where the financiers are desperately trying to find returns and the only place they can find them is in the leftovers of startups. Financiers are trying to figure out how they can make cost-effective smaller investments because there just isn’t the call for big rounds of capital raising any more. So right-wing parties in the pockets of Wall Street (and its international equivalents) aren’t particularly relevant here either.
  • Special government programs. (In fat, there is clear evidence from the QUT CAUSEE study that some of these can be actively damaging.) So leftist parties aren’t going to be terribly interesting to the equity class.
  • Natural resources, land area, or access to particular places. I’ve seen (and worked with) hyper-growth firms that have operated out of heritage listed buildings, garages, beach fronts, dedicated incubators and top-floor city offices. I’ve had meetings with people running significant startup companies where we brought our children to the park and they worked sitting on a picnic rug. High population densities (to bring together the skills and resources required) do seem to be important, so both the parties supporting farmers and parties trying to keep the natural environment preserved are irrelevant and in some cases actively antagonistic. (Try suggesting “no genetically modified organisms” to a biotechnologist and see what happens.)
The story of the Australian (and the world economy) over the next 20–50 years is going to be the rise of this equity lord class. In the same way that the landed gentry gained wealth and then used that to leverage political power in the past, the equity lords will grow in wealth and in numbers and in their desire to be represented politically.
Who is going to represent them? Based on the dot points above it doesn’t look like any existing major party is well positioned for it. But there is at least one minor party that looks very well aligned. Looking at the parties at the last Federal election, it’s obvious who it will be representing the experts who will be running the artificial intelligences and nanotech factories that will be pervasive in our lives mid-century.
So, while it would be easy to dismiss the Science Party / Cyclists coalition as just another silly minor party (the one who polled lowest outside of their alliance), I’m predicting a steady growth over the next decades in both its size and its support. Paul Graham has written about the possible political implications of startups, and in Australia we’re seeing that play out starting right now. Don’t dismiss the possibility of Prime Minister Meow-Meow Ludo Meow defending the seat of Grayndler in the 2036 election.

Friday 10 June 2016

GRE for Linux requirements

If you are using Data Protector to backup your VMware environment, and you have Linux boxes, you might have tried to use the Granular Recovery Extension (VMware GRE). The GRE lets you recover individual files from a VM-level or VMDK-level backup; it does this by mounting the VMDK file on a Linux GRE proxy.

There are three variations of GRE restore:
  • If you backup to a StoreOnce device (e.g. a B6200, a D2D4500, a StoreOnce virtual appliance, or a software storeonce component on a Windows or Linux member of the cell)... then you need a very large disk and not much else.
  • If you backup using 3PAR snapshots (which works very well indeed) and you are doing a GRE recovery from a snapshot, you don't need a big disk or anything else much at all -- just something connected to the 3PAR the meets the usual (documented) list of requirements.
  • The very weirid case is if you use a SmartCache device. These are uncompressed, raw disk spaces for putting VMware backups on to. The SmartCache is accessed by the Windows and Linux proxies via Windows file sharing. Thus the Linux GRE proxy server needs to have Samba installed on it.

One remaining issue, that I hope gets fixed one day soon: the machine that you want to restore to has to have a world-writeable NFS share. Ouch: given that it is copying from one Linux box to another Linux box, I'm not quite sure why this couldn't have been done with SFTP. So I suspect everyone will just have a multi-stage restore for GRE on Linux boxes:
  • Load the backup from StoreOnce onto a large disk.
  • Restore files from that to a server with an insecure NFS share.
  • Copy from the NFS share via SSH to the actual server where you needed the file restored.
I presume also that setuid / setgid binaries are therefore not supported for GRE restore. (Because who creates a world-writeable NFS share without the nosetuid,nosetgid options enabled?) Implication: you can't restore /usr or /sbin from a VMware backup reliably.

Of course, you might find it cheaper to use VMX ( instead of Data Protector -- it is much cheaper than the equivalent GRE and VEAgent licenses. As far as I know, it doesn't face these limitations.

Greg Baker is an independent consultant who happens to do a lot of work on HPE DataProtector. He is the author of the only published books on HP Data Protector ( He works with HPE and HPE partner companies to solve the hardest big-data problems (especially around backup). See more at IFOST's DataProtector pages at, or visit the online store for Data Protector products, licenses and renewals at 

Wednesday 18 May 2016

Today's silliness: social etiquette bots had me writing a semi-serious article as a riff from Anne-Tze's Appointment, and then decided to cancel it after I went through three drafts. So I published it anyway: What I need is a social etiquette bot.

Monday 9 May 2016

Data Protector e-learning is now available -- with a special bundle

I've negotiated some very good pricing with HPE for their e-learning content -- as far as I can tell, there is no way of getting access to it anywhere in the world cheaper except by stealing it. I've sweetened the deal with a bundle of all my Data Protector 9 books and a virtual lab environment offer.

I've only got an English samples, but it's available in several languages. Have a look at some sample course content to get a feel of what it's like. (It's very similar to the Data Protector Essentials course).

There are interactive simulations which give hands-on practice very simply and easily. And finally, there are job aids supplied as part of the package.

Available for purchase now (along with lots of other Data Protector resources, books, licenses)...

Greg Baker is an independent consultant who happens to do a lot of work on HPE DataProtector. He is the author of the only published books on HP Data Protector ( He works with HPE and HPE partner companies to solve the hardest big-data problems (especially around backup). See more at IFOST's DataProtector pages at, or visit the online store for Data Protector products, licenses and renewals at 

Thursday 21 April 2016

HP Service Manager tools @JIRAServiceDesk @github

For customers running HP Service Manager, I have two freebies:


Tools for interacting with HP Service Manager
  • activitywsdl.unl -- enable WSDL access to the Activity table
  • -- when NNM detects a node goes down, either update the existing Service Manager incident or create a new one.
  • -- When NNM generates an event (either up or down), dispatch appropriately to Service Manager
  • -- a much easier way of having HP SM receive emails that doesn't involve Connect-IT. Edit email2ticket.conf and you're ready to go
  • -- similar to email2ticket but designed to work with FastPass, and report the ticket as closed automatically
  • -- a much easer way for HP SM to send emails that doesn't involve Connect-IT. Edit sm2email.conf and that's about it.
  • -- if HP SM tries to send a "pager" notification, send an SMS. Doesn't involve Connect-IT.
  • -- library and program -- Swiss army knife of interacting with Service Manager on the commandline
  • -- instead of polling an IMAP or POP server, why not deliver your customer interaction emails via procmail through to which will turn them into interactions instantly (i.e. no polling delay). Amaze your customers.
  • -- command-line script for sending emails

Greg Baker ( a consultant, author, developer and start-up advisor. His recent projects include aplug-in for Jira Service Desk which lets helpdesk staff tell their users how long a task will take and a wet-weather information system for school sports.

Data Protector tools now on github

I've written a number of tools that help when administering Data Protector. I did some spring cleaning (in Autumn), found all of them that I could, and put them into a github repository.

Here's the README from it...


Programs to support HPe Data Protector

Performance tools

  • - This program prints out the throughput rate of the specified sessions, or all current sessions if no sessions are specified.
  • - This program prints out a report on DataProtector backup throughput performance for a completed session.

Tools for migrating between cell managers and keeping them in sync

  • - A script to make a two cell managers have the same pools
  • - Generate a script to export/import every client in a cell
  • - This program walks through everything in the media management database and writes MCF files out to the output directory, unless they already exist on the (optionally specified) target server. Then it copies it to target-server-directory (with an extension of .temp, which gets changed to .mcf once it is complete).
  • - watches for files in the watch-directory that end in .mcf. When it sees one, it checks to see if it is already known about in the DataProtector internal database. If it is not already in the database, it is imported.
  • - A script to make a two cell managers have the same pools

Tools for copying sessions between cell managers

These are mostly obsolete as of DP 9.04 because you would typically create a copy job from one cell manager's storeonce to the another cell manager's storeonce. These are still relevant if you want to keep physical tapes in two locations.
  • - process incoming MCF files
  • - A script to export MCF files after a backup

Software for keeping track of tapes

  • - a command-line program which updates the Data Protector database of tape locations by letting you zap the tapes with a barcode reader device
  • - a CGI version of

Miscellaneous programs that don't fit elsewhere

  • - This program files media in a tape library into one of two media pools, based on their slot number.
  • - This program sends an SMS through the ValueSMS gateway to report on mount requests. It can be used as a mount script for a device.
  • - Exercises every device (tape, storeonce, etc) by running a tiny backup to it

Greg Baker is an independent consultant who happens to do a lot of work on HP DataProtector. He is the author of the only published books on HP Data Protector ( He works with HP and HP partner companies to solve the hardest big-data problems (especially around backup). See more at IFOST's DataProtector pages at, or visit the online store for Data Protector products, licenses and renewals at 

Tuesday 8 March 2016

The management spinoff of the space race and of the dotcom era

ISO9000 was -- to some extent -- the translation of the engineering practices that supported the space program in the 1960s into the world of business: keep your documents controlled, make sure you know that what you are doing is working, and so on.
The equivalent of 1960s space engineering in the modern day is software engineering, and the practices that have been developed by software teams will form the basis of best-practice management in another decade or two's time. 
What practices am I talk about? What is it that software teams do, that is so obviously right that it barely gets mentioned, but is so obviously lacking from non-software business management? (Put another way: non-software business managers drown in email to the point of unproductivity. Why is that? And what does github do which happens to solve these problems?)
  • Instructions, procedures, notes, tomes, meeting reports, decision summaries, discussions and so on are living documents. Programs are just an example of these documents. They need version control, and be dynamically generated into their final forms. Here's an article I wrote a few weeks ago about this:
  • Issues, bugs, requests, tasks and so on need to be tracked through a workflow, and assigned to teams and thence down to individuals. (Blog link:
  • Operational support teams need to be accessible at short notice, and that often includes specialised development teams who are expensive to interrupt. This dichotomy is what drives teams to adopt text chat where one member of the team has the "disturbed" role to try to answer questions from outside teams. (Blog link:
  • Outcomes should be expressed as tests that can pass or fail. For programs, this means automated test suites. Outside of software, this is generally "a well-worded contract" but will soon sometimes to mean "contracts encoded in a formal language and attested in the blockchain" or "contract terms that a robot lawyer can read and judge."
  • The longer and more continuous and automated the path is from the specification to the finished product the better. Automated builds, automated deploys, coding in domain-specific languages that let ideas be expressed simply -- these are what we use in software. The next decade will see the same ideas expressed in fashion (the dressmakers pattern gets passed to a robot textile workers), in manufacturing (the design gets 3D printed), in law (self-judging contracts) and so on. 
Anything else I should add to this list?
We always talk of the incredible failures of software engineering -- and there are many, and they can be enormous -- but rarely of the incredible progress we have made as well. The progress is invisible and assimilated; the failures become news.

Tuesday 1 March 2016

Draining the meeting bogs and how not to suffer from email overload (part 4)

This is the fourth (and probably last) in my series of blog posts about how we unknowingly often let our IT determine how we communicate, and what to do about it.

Teams need to communicate at three different speeds: Tomes, Task Tracking and Information Ping-pong. When we don’t have the right IT support to for all three, things go wrong. This week I'm writing about Task Tracking.

I lose my luggage a lot. Domestic, international, first world, developing nations; I’ve had travel issues in more places than most people get to in their lives. I’ve even had airlines locate my lost luggage and then lose it again in the process of delivering it back to me.

Two of the more recent lost luggage events stood out by how they were handled:

  • On one occasion, a senior member of staff apologised to me in person, and promised that it he would have his staff on to it immediately; 
  • On the other, a bored probably-recent graduate took my details, mumbled the official airline apology and gave me a reference number in case I had a query about where it was up to.

I felt much more confident about the mumble from the junior than the eloquence of the senior.


Because there was a reference number. The reference number told me:

  • There was a process that was being followed. It might be a very ad-hoc process; it might often fail, but that's clearly better than the alternative.
  • If the team leader is micro-managing the tasks of their staff, it's because they are choosing to do so. The process must have been done a few times before, so the staff know what to do.
  • The team leader will probably not be the bottleneck on stuff getting done. We've all seen it, or been there: the project manager who is painfully overworked while team members are idle (often unknown to the project manager).
  • The process will therefore scale somewhat. Staff can be empowered enough that the process could scale without a single-person bottleneck.
  • It told me that there was a team of people who work on finding lost luggage, and that someone in that team would be working on it.
  • If a whole flight-load of luggage had been lost, scaling up resources to help wouldn't have been impossible.
  •  It didn’t matter to me who was working on it; if I was wondering whether any work had been done, I would have logged into their portal and looked it up, and wasted no-one’s time or effort but my own.

Having the manager’s name and assurance gave me no such confidence, but instead the sure knowledge that if I called the manager, the manager would have to get back to me (i.e. go and query the staff member he had assigned). This process would have consumed staff time and effort; which meant that if a large number of people had lost their luggage and were all asking the same question, that there would be gridlock and thrashing (so many interruptions that nothing gets completed).

In your team, who is busier? The team leader / project manager, or the technical staff doing the work?

If someone needs something from your team, how is that work tracked? Do they have someone senior that they call, or can they find out what they want to know wasting no-one's time but their own?

In a sensible, well-run organisation the staff with more seniority should have more idle time than the staff who report to them. Otherwise, they are the bottleneck holding back the organisation's efficiency.

If this is happening, then something is wrong with the way ticket tracking is being done.

The most common ticket software systems tend to be helpdesks or service desks because the scale of such organisations usually make it essential It is almost impossible to live without them in software development once the software has reached the complexity experienced in a very minimal viable product.

But ticket-tracking can be done with almost any technical team inside or outside of IT. Here's my list of the minimum requirements for a ticket-tracking system to be useful:

DUAL FACING Ticket tracking systems convey two important pieces of information: that something is being worked on (or that it isn’t) and what tasks people are working on.

On the one hand, the system needs to be easy for end-users to see where their request is up to which is why tracking tasks in a private spreadsheet doesn't work.

The ticketing system should automate messages to requesters, create follow-up surveys when the ticket is resolved, and so on.

On the other hand, the ticketing systems needs to be fast and efficient for staff to update, or else staff will batch their updates in a large chunk at the end of the day or the end of the week. It also needs to provide reporting to management to provide a high-level overview.

ENUMERATED You want discussions to be about "LTS-203" not "the project to update the website branding" so that everyone is on the same page. The tracking system has to provide a short code that you can say over the telephone or in a conversation, and that usually means some kind of short string of letters (perhaps three or four, or a pronounceable syllable) followed by a number. If that number is 7 digits long, you have a problem, because no-one will remember it, nor be able to say it.

EMBEDDABLE Whatever you are using for your tomes and reporting, you want to be able to embed your ticket information into it, and have the status there automatically update. This makes project meetings smoother and more efficient, because you can embed the ticket into the minutes and quickly glance back to last week’s meeting notes to see what needs to be reviewed. If software projects are involved, then being able to embed ticket references into the revision control system is very helpful.

UNIVERSAL If the entire organisation can work off the same ticket tracking system, that is ideal. One client I worked with had numerous $100,000+ projects being blocked -- unable to be completed -- because another (unrelated) department had delayed some logistics for efficiency. It required months of investigation to uncover -- which should have been possible to identify by clicking through inter-team task dependencies.

ADAPTABLE In order to be universal, the workflow needs to be customisable per team and often per project. For some teams, To-Do, In Progress and Done are sufficient to describe where the work is up to. For others, there can be a 20-step process involving multiple review points. ITIL projects often end up clunky and barely useable because the One Universal Process for all Incidents is really only necessary for a handful of teams, and the rest are forced to follow it.

LOCK-FREE When a project is urgent you will have more than one person updating a ticket at the same time. This isn't the 1980s any more: it's silly and inefficient for one user to lock a ticket and leave someone else idle, unable to write. Time gets lost, and more often than not, the update gets lost as well.

PREDICTIVE We live in an era of deep-dive data mining. It's no longer acceptable to say to a customer or user "we have no idea how long this is going to take" any more than an advertising company could say "we don't know how much money to spend on your campaign". And yet, I still see helpdesks logging tickets and giving no indication to their user base of when to expect the ticket to be resolved. At the very least, make sure your ticketing system uses my three rules of thumb. Or better still, make sure it works with my Automated Estimator of Effort and Duration to get the best real-time predictions.

That seems to be the most important seven criteria.

  • Spreadsheets don't work, and yet I still see them used in almost every organisation.
  • Most startups use Jira or Basecamp. Basecamp also includes capabilities for Tomes and Information Ping Pong. 
  • Best Practical's RT is the most mature of the open source tools (even though it doesn't meet some criteria). Trac is another commonly-used open source tool, particularly when it is bundled with software repository hosting.
  • Large enterprises often use HPE Service Manager (which is less popular than it was in the past and is being replaced by Service Anywhere), ServiceNow (who took a lot of HPE's marketshare) and BMC Remedy. They are generally 10x - 100x the price of Jira but are designed to work better in highly siloed organisations.

Be aware that if the nature of someone’s job is that they will work on the same problem for months or years -- for example, a scientific researcher -- there’s probably little value in ticket tracking because there would be so little information to put in there. Likewise, if someone’s job is to do hundreds of tasks per day, then any ticket tracking will have to be automated by inference from the actions or else the overhead of the tracking system might make it impractical.

I'm hoping to put this (and many other thoughts) together in a book (current working title: "Bimodal, Trimodal, Devops and Tossing it over the Fence: a better practices guide to supporting software") -- sign up for updates about the book here:

If you like this series -- and care about making organisations run better with better tools -- you'll probably find my automated estimator of effort and duration very interesting.

Greg Baker ( is a consultant, author, developer and start-up advisor. His recent projects include a plug-in for Jira Service Desk which lets helpdesk staff tell their users how long a task will take and a wet-weather information system for school sports.

Tuesday 23 February 2016

Draining the meeting bogs and how not to suffer from email overload (part 3)

This is the third in my series of blog posts about how we unknowingly often let our IT determine how we communicate, and what to do about it.

Teams need to communicate at three different speeds: Tomes, Task Tracking and Information Ping-pong. When we don't have the right IT support to for all three, things go wrong.

In this post, I'll talk about team communication Information Ping-Pong.  Information Ping-Pong is that rapid communication that makes your job efficient. You ask the expert a question and you get an answer back immediately because that's their area, and they know all about it.

It's great: you can stay in the flow and get much, much more done. It's the grail of an efficient organisation; using the assembled team of experts to their full.

Unfortunately, what I see in most organisation is that they try to use email for this.

It doesn't work.

Occasionally the expert might respond to you quite quickly, but there can be long delays for no obvious reason to you. You can't see what they are doing -- are they handling a dozen other queries at the same time? And worse: it is just contributing to everyone's over-full inbox.

The only alternative in most organisations is to prepare a list of questions, and call a meeting with the relevant expert. This works better than email, but it's hard to schedule a 5 minute meeting if that's all you need. Often the bottom half of the list of prepared questions don't make sense in the light of the answers to the first half, and the blocked-out time is simply wasted.

The solution which has worked quite well for many organisations is text-chat, but there are four very important requirements for this to work well.

GROUP FIRST Text chats should be sent to virtual rooms; messages shouldn't be directed to an individual. If you are initiating a text-chat to an individual, you are duplicating all the problems of email overload, but also expecting to have priority to interrupt the recipient.

DISTURBED ROLE There needs to be a standard alias (traditionally called "disturbed") for every room. Typically one person gets assigned the "disturbed" role for the day for each team and they will attempt to respond on behalf of the whole team. This leaves the rest of the team free to get on with their work, but still gives the instant-access-to-an-expert that so deeply helps the rest of the organisation. (Large, important teams might need two or more people acting in the disturbed role at a time. )

HISTORY The history of the room should be accessible. This lets non-team members lurk and save on asking the same question that has already been answered three times that day.

BOT-READY Make sure the robots are talking, and plan for them to be listening. If a job is completed, or some event has occurred, or any other "news" can be automatically sent to a room, get a robot integrated into your text chat tool to send it. This saves wasted time for the person performing the "disturbed" role.

Most text chat tools also have "slash" commands or other ways of directing a question or instruction to a robot. These are evolving into tools that understand natural language and will be one of the most significant and disruptive changes to the way we "do business" over the next ten years.

Skype and Lotus Notes don't do a very good job on any of the requirements listed above. Consumer products (such as WhatsApp) are almost perfectly designed to do the opposite of what's required. WeChat (common in China) stands slightly above in that at least it has an API for bots.

The up-and-coming text chat tool is a program called "Slack", although Atlassian's Hipchat is a little more mature and is better integrated with the Atlassian suite of Confluence and Jira.

Unlike most of the tools I've written about in this series, the choice of text chat tool really has to be done at a company level. It is difficult for a team leader or individual contributor to drive the adoption from the grassroots up; generally it's an IT decision about which tool to use, and then a culture change (from top management) to push its usage. Fortunately, these text chat tools are extraordinarily cheap (the most expensive I've seen is $2 per month per user), and most have some kind of free plan that is quite adequate. Also, there's a good chance that a software development group will already be using Hipchat, which means that adoption can grow organically from a starting base.

Outside of a few startups, text-chat is very rare. And also outside of a few startups, everything takes far longer than you expect it to and inter-team communication is painfully slow. It's not a co-incidence. We think this mess is normal, but it's just driven by the software we use to intermediate our communications.

The next post in the series will hopefully be next Tuesday.

I'm hoping to put this (and many other thoughts) together in a book (current working title: "Bimodal, Trimodal, Devops and Tossing it over the Fence: a better practices guide to supporting software") -- sign up for updates about the book here:

If you like this series -- and care about making organisations run better with better tools -- you'll probably find my automated estimator of effort and duration very interesting.

Greg Baker ( a consultant, author, developer and start-up advisor. His recent projects include a plug-in for Jira Service Desk which lets helpdesk staff tell their users how long a task will take and a wet-weather information system for school sports.