IFOST Blog: 2017

Saturday, 2 December 2017

Another blog quoting me about using chatbots for customer service

https://www.telcosolutions.net/blog/blog/omnichannel-customer-services-customer-contact-strategy/

Thursday, 6 July 2017

Getting your fair share of AWS EC2, VMware, Hyper-V, etc.

One of the problems with virtualised infrastructure -- especially cloud servers -- is that your programs don't always get the CPU time that they need. The CPU time for your application is shared between CPU time given to other virtual machines on the same hardware. This is good for cost savings, but it would be nice to know how badly you are being affected by this.

I had a customer who had some really interesting VMware scheduling problems. They had a large number of multi-cpu virtual machines. VMware can't schedule a 4-cpu virtual machine unless there are 4 physical CPUs free. So even if you only have one tiny job to run on one CPU, VMware can't just schedule that one CPU -- it has to wait until at least 4 are available. (Incidentally, those other 3 CPUs that are scheduled but do nothing count towards co-stop% which is another interesting performance-and-tuning metric.)

As it turns out, almost all their virtual machines had some small tasks going on in the background (e.g. cluster heartbeats), so VMware tried very hard to schedule them all as best as it could, but the result wasn't pretty. How bad was it?

I wrote a little program that slept for a second, woke up, recorded the time, and then went back to sleep again. It kept statistics about how delayed it was. There were some horror stories -- there were virtual machines that received no CPU time for more than 90 seconds on occasions! No wonder their clusters kept crashing -- the cluster heartbeat time was only 30 seconds, so there was no way the cluster could stay up with VMware starving it of CPU time like that.

Anyway, I tidied up that program ("am-i-scheduled") and packaged it for RHEL7; the binary is so simple that it will also run on Ubuntu unchanged. I suspect it will run almost anywhere.

If you run AWS EC2 servers or other cloud-hosted servers, you really want to install this. This is the most convenient way you can find out how much CPU time you aren't getting: if your virtual machine is co-located with another virtual machine that is being used for Bitcoin mining, or password cracking, or a deep learning problem, you might want to terminate and try again. With am-i-scheduled, you can detect this easily, and measure the impact you are experiencing.

Here's the source: https://bitbucket.org/solresol/am-i-scheduled and the binaries (including RPMs) can be downloaded from here: https://bitbucket.org/solresol/am-i-scheduled/downloads/

Tuesday, 16 May 2017

Two happy stories about the Australian legal system

Too often we whinge and whine about problems in the legal system. Let me share two stories where the legal system did what it was supposed to.

The first: a customer of mine wasn’t paying for work that obviously had been done -- a joint marketing program of Google adwords which were visible in the Google analytics console, a class I taught and some other hard-to-deny activities. There were all sorts of excuses and promises to pay by a certain date that would come and then go without anything happening. Months went by.

I’ve run IFOST for just under 20 years and I’ve tried to avoid court at all costs, even when I’ve been in the right and could have done so. It’s better to live at peace, forgive and forget and so on.

I don’t know why I finally flipped and decided that enough was enough. I threatened legal action, and didn’t get a sensible response. So I went to lawlink and created an account. Actually, I had to create three accounts: the first one was in the name “Greg Baker” but my passport, driver’s license and Medicare card all say “Gregory Baker” so I couldn’t validate my identity. The second I choose the wrong category of company, but I got it right the third time. About an hour wasted from some bad UI design on lawlink, but I’ve done it now.

Then I walked through an online form for a statement of claim. It was less than $10,000 so I was able to file in small claims court. It was straightforward enough to do so, but I wish there had been a better UI for calculating what the incurred interest expense was. You have to go to another page to get the annual interest rates for each year, turn that into a monthly interest rate by hand, apply those interest rates (e.g. in a spreadsheet) and then add up it all up. It could have been easier.

The filing fee was just under $200 and I received all the paperwork by email. I then forwarded the most official looking PDF to my contacts and said “you are going to have to answer this case, including paying the filing fees and interest.” I had an apology that afternoon from the most senior management at my client. I posted the statement of claim, signed a stat dec that I had done so. The day that their legal team was served with it, stuff happened and I was paid.

It was such a positive experience. The rule of law was respected, it wasn’t difficult or expensive or time consuming for me to make the case and the matter was resolved far faster than if I’d tried to keep chasing up the debt the way I normally do.

The second story: I was talking to my lawyer about the experience (she wasn’t involved in the case at all, I just thought I would let her know afterwards) and noticed that she’d written a piece about changes to contract law in Australia.

Here’s the article: https://www.linkedin.com/pulse/unfair-contract-term-provisions-extended-small-business-turner

I am not a lawyer but as a small business owner dealing with multinationals all the time, all I can say is: wow. I have stacks of absurd contracts that I’ve had to sign to win business that have exposed me to a lot of risk. I’ve just relied on the good nature of my customers most of the time, and been burned occasionally. Or, sometimes, I’ve just walked away from good productive work that would benefit everyone simply because I couldn’t afford this risk.

But now I have some sort of protection: I can (and will) point out clauses that a court would disallow and leave it up to the customer to decide whether they want to keep the pointless clause in place.

In Australia, where the vast majority of the economy consists of small businesses, there really was no reason that our legal system should have allowed lopsided contracts, ever. This one small change to contract law will have a ripple effect through the Australian economy. I look forward to seeing how this plays out.

So that’s two bits of good news in a week.

Wednesday, 3 May 2017

Chatbots that get smarter at #AtlassianSummit

I’m in Barcelona at the moment, at AtlasCamp giving a talk about helpdesk chatbots that get smarter.

It’s easy to write a dumb chatbot. It’s much harder to write a smart one that responds sensibly to everything you ask it. Some famous examples: if a human mentions Harrison Ford, they a probably not talking about a car.

There are three different kinds of chatbot, and they are each progressively harder to get write.

The simplest chatbots are just a convenient command-line interface: in Slack or Hipchat, there are usually “slash” commands. Developers will set up a program that wakes up to “/build” whenever it is entered into a room that pulls the latest sources out of git, compiles it and shows the output of the unit tests. Since this is a very narrow domain, it’s easy to get right, and as it is for the benefit of programmers, it’s always cost-effective to spend some programmer time improving anything that isn’t any good.
The next simplest are ordering bots, that control the conversation by never letting the user deviate from the approved conversational path. If you are ordering a pizza, the bot can ask you questions about toppings and sizes until it has everything it needs. Essentially this is just a replacement for a web form with some fields, but in certain markets (e.g. China) where there are near-universal chat platforms this can be quite convenient.
The hardest are bots that don’t get to control the conversation, and where the user might ask just about anything.

Support bots are examples of that last kind: users could ask the helpdesk just about anything, and the support bot needs to respond intelligently.

I did a quick survey and found at least 50 startups trying to write helpdesk bots of various kinds. It’s a lucrative market, because if you can even turn 10% of helpdesk calls into a chat with a bot, that can mean huge staff cost savings. I have a customer with over 150 full-time staff on their servicedesk -- there are millions of dollars of savings to be found.

Unfortunately, nearly every startup I’ve seen has completely failed to meet their objectives, and customers who are happy with their investments in chatbots are actually quite rare.

I’ve seen three traps:

Several startups have lacked the courage to believe in their own developers. There’s a belief that Microsoft, Facebook, Amazon, IBM and Google have all the answers, and that if we leverage api.ai or wit.ai or lex or Watson or whatever they’ve produced this month that there’s just a simple “helpdesk knowledge and personality” to put on top of it, like icing on a cake. Fundamentally, this doesn’t work: for very soud commercial reasons the big players are working on technology for bots that replace web forms and with that bias comes a number of limiting assumptions.
A lot of startups (and larger companies) believe that if you just scrape enough data from the intranet -- analyse every article in Confluence for example -- that you will be able to provide exactly the right answer to the user. Others take this further and try to scrape public forums as well. This doesn’t work because firstly, users often can’t explain their problem very well, so there’s not enough information up front even to understand what the user wants; and secondly... have you actually read what IT people put into their knowledge repositories?
There are a lot of different things that can go wrong, and a lot of different ways to solve a problem. If you try to make your support chatbot fully autonomous, able to answer anything, you will burn through a lot of cash handling odd little corner cases that may never happen again.

The most promising approach I’ve seen was one taken by a startup that I was working with late last year. When they decided to head in another direction, I bought the source code back off them.

The key idea is this: if our support chatbot can’t answer every question -- as indeed it never will -- then there has to be a way for the chatbot to let a human being respond instead. If a human being does respond, then the chatbot should learn that that is how it should have responded. If the chatbot can learn, then we don’t need to do any up-front programming at all, we can just let the chatbot learn from past conversations. Or even have the chatbot be completely naive when it is first turned on.

The challenge is that in a support chat room, it’s often hard to disentangle what each answer from the support team is referring to. There are some techniques that I’ve implemented (e.g. disentangling based on temporal proximity, @ mentions and so on). A conversative approach is to have a separate bot training room where only cleanly prepared conversations happen. Taking this approach means that we substitute expensive highly-paid programmers writing code to handle conversations and replace them with an intern writing some text chats.

It’s actually not that hard to find an intern who just wants to spend all day hanging out in chat rooms.

Whatever approach you take, you will end up with a corpus of conversations: lots of examples of users asking something, getting a response from support, clarifying what they want, and then getting an answer.

Predicting the appropriate thing to say next becomes a machine learning problem: given a new, otherwise unseen data blob, predict which category it belongs to. The data blobs are all the things that have been said so far in the dialog, and the category is whatever it is that a human support desk agent is most likely to have said as a response.

There is a rich mine of research articles and a lot of well-understood best practice about how to do machine learning problems with natural language text. Good solutions have been found in support vector machines, LTSM architectures for deep neural networks, word2vec embedding of sentences.

It turns out that techniques from the 1960s work well enough that you can code up a solution in a few hours. I used a bag-of-words model combined with logistic regression and I get quite acceptable results. (At this point, almost any serious data scientist or AI guru should rightly be snickering in the background, but bear with me.)

The bag-of-words model says that when a user asks something, you can ignore the structure and grammar of what they’ve written and just focus on key words. If a user mentions “password” you probably don’t even need to know the rest of the sentence: you know what sort of support call this is. If they mention “Windows” the likely next response is almost always “have you tried rebooting it yet?”

If you speak a language with 70,000 different words (in all their variations, including acronyms), then each message you type in a chat gets turned into an array of 70,000 elements, most of which are zeroes, with a few ones in it corresponding to the words you happen to have used in that message.

It’s rare that the first thing a support agent says is the complete and total solution to a problem. So I added a “memory” for the user and the agent. What did the user say before the last thing that they said? I implemented this by exponential decay. If your “memory” vector was x and the last thing you said was y then when you say z I’ll update the memory vector to (x/2 + y/2). Then after your next message, it will become (x/4 + y/4 + z/2). Little by little the things you said a while ago become less important in predicting what comes next.

Combining this with logistic regression, essentially you assign a score for how strong each word is in each context as a predictor. The word “password” appearing in your last message would score highly for a response for a password reset, but the word “Windows” would be a very weak predictor for a response about a password reset. Seeing the word “Linux” even in your history would be a negative strength predictor for “have you tried rebooting it yet” because it would be very rare for a human being to have given that response.

You train the logistic regressor on your existing corpus of data, and it calculates the matrix of strengths. It’s a big matrix: 70,000 words in four different places (the last thing the user said, the last thing the support agent said, the user’s memory, and the support agent’s memory) gives you 280,000 columns, and each step of each dialog you train it on (which may be thousands of conversations) is a row.

But that’s OK, it’s a very sparse matrix and modern computers can train a logistic regressor on gigabytes of data without needing any special hardware. It’s a problem that has been well studied since at least the 1970s and there are plenty of libraries to implement it efficiently and well.

And that is all you have to do, to make a surprisingly successful chatbot. You can tweak how confident the chatbot needs to be before it speaks up (e.g. don’t say anything unless you are 95% confident that you will respond the way that a support agent will). You can dump out the matrix of strengths to see why the chatbot chose to give an answer when it gets it wrong. If it needs to learn something more or gets it wrong, you can just give it another example to work with.

It’s a much cheaper approach than hiring a team of developers and data scientists, it’s much safer than relying on any here-today-gone-tomorrow AI startup, and it’s easier to support than a system that calls web APIs run by a big name vendor.

If you come along to my talk on Friday you can see me put together the whole system on stage in under 45 minutes.

Tuesday, 14 March 2017

Going off-line for Lent, some nostalgia and a request for software development teams

I gave up always-on connectivity for Lent.

Well, truth be told, that wasn’t quite how I planned it. It just happened, and it wasn't for the whole of Lent. On my most recent trip out-of-town I was worried about theft and damage more than I normally am. Given that a "normal" business trip for me often involves third world cities with 90% unemployment and an hourly murder rate, you can guess what horror I was dealing with: teenagers.

So this is what I took:

That's a Nokia 110 and a very old IBM laptop running Linux. The laptop did have Wifi, but no bluetooth. The Nokia had a very basic bluetooth, but has no capability to tether a laptop for internet access.

I'm not a luddite so I set myself up with the best I could using the hardware available. That's a modern Linux distribution -- which in itself is interesting: no one would try doing serious work on a Windows 98 system, which is probably what that laptop once ran.

Between bouts of nostalgia, I paused for reflection: roughly 15-20 years ago I was running something very similar to this kind of environment, and I was often to be found programming in the lobby of some hotel after delivering a tech class in some distant city. What have we gained, and what have we lost?

What was difficult

Not being online was a moderate challenge.

All my accounting and business systems are SaaS, so I couldn't do anything about the backlog of bank reconciliations that I need to catch up on.
Navigation was a problem. It's been a long time since I've navigated without GPS and it took a while to get used to it.
I couldn't look up the solution to any programming problem, so I defaulted to being very conservative in what I worked with: just the stuff I knew well enough that I wouldn't need to look anything up.
No Slack messaging. I came back home to a Slack constellation of red numbers on just about every channel. I'm running a mid-sized data science / AI class at the moment and a large number of students had homework questions and submissions.

Perhaps a bit of better planning (both short-term and long-term) would have resolved these problems. Maybe if I'd chosen GnuCash or OpenERP instead of Saasu accounting wouldn't have been a problem. I could have carried around a physical GPS device, or taken some cheap device that could run google maps offline. I could have cached lots of documentation (as I used to do), but a lot of websites and tutorials don't lend themselves that very easily. I could have installed a Slack client, but with long round trip delays it would have been functionally equivalent to email.

What surprised me is what I struggled with the most. Photography was terrible. I’m not a serious photographer, but I take quite a few pictures on my usual phone, which backs them photos up automatically after which they get turned into panoramas and stories and so on. But the Nokia camera has awful resolution, and I don't even have a way of getting the pictures off it. We have raised our expectations of acceptable photos and videos very quickly in the last decade. This doesn't seem to be slowing down, so what sort of cameras will we be using in 2025?

Then the other big problem was ergonomic. The laptop was clunky and heavy, it didn't fit in my backpack very well and the battery life was terrible. It was probably mediocre at the time, but between the decay of the lithium battery and our increased expectations of what is normal, I found it frustrating. I was chained to the wall. I couldn't move around or choose to sit outside to work for a while.

The Nokia had the opposite problem. It was so small I kept thinking I'd lost my phone, not realising that it was still in my pocket.

Where the internet has made no difference

By running a modern Linux distribution, I took advantage of Linux's heritage, steeped in a world where being off-line was normal, and internet connectivity was brief: a world where we aren't bound to monthly subscription SaaS services.

Instead of Google Docs, I wrote text in emacs. I could have used something more normal (e.g. LibreOffice; or even run Microsoft Office in Wine on Linux) but why bother? Real-time collaboration would have been a problem of course, but the vast majority of documents I work on have one author only.
Email and backups were batch-mode: I was able to get to wif intermittently during the trip (e.g. at the airport). I suppose pre-wifi this would have been like the times that I dialed up to get online. The IBM laptop did actually have a built-in modem, so I could have done a full nostalgia trip and listened to it squawk if I'd known of any ISP left in the country who still offered dial-up services.
I just ran git push less frequently than I normally would, but everything else in git works as well off-line as it does online.
All my data science tools were there on my laptop. The night before I left I'd run pip install jupyter ; pip install scikit-learn and it had finished by the time I packed my laptop up. We think of big data, AI and machine learning as modern things that require giant server farms, but most of the prototyping and testing can be done on sample datasets that are small enough to fit in even quite an ancient laptop.
My wife called using a phone number instead of a WhatsApp contact. Skype redirects to my mobile number anyway, so no-one would have noticed anything if they had tried to Skype me.

Hearken to a simpler time

I'm not being rose-eyed for the past: I'm aware that there were some serious limitations. But what I found fascinating is that there were some advantages.

I slept better. There was no point in checking my phone for email before I went to bed. Maybe the Nokia 110 has a snake game or something, but all I did was just set an alarm. Many hours later, the alarm rang. Then I got up. That was the entirety of my interactions with the phone in bed.
Focus was easy: I wrote more text in one afternoon than I have done in weeks, possibly more than I've done this year. There were no interruptions. No email, no Slack, no chance to waste time browsing reddit or Hacker News.
There was so much spare time! At the end of the day I read, uninterrupted.
Breaks were real breaks. I had to put my laptop down, because it couldn't go anywhere away from AC power for very long. I would walk around and get some fresh air. I watched some children play hide-and-seek for a few minutes because I couldn't bury my face in my phone. Well, I could, but it was pointless. There was nothing there.

What we need to resolve once and for all

This was all extremely illuminating given my most recent data science project which was for a famous company that runs a lot of software development projects.

I analysed around 200 projects and identified some factors that can predict when a project is going to miss deadlines. I've done this before for other customers -- I've got it mostly automated for JIRA and it drives the smarts in http://www.queckt.com/ if you want to take a look at it.

In most organisations there is one extremely clear predictive factor associated with missed deadlines: the word "meet" or "meetings" appearing in a JIRA ticket. But this company runs development fully-remote. You don't ever meet in person: it's all communications via Slack, or teleconference or whatever else the team wants to use. So the usual "meeting" factor predicted nothing. It didn't occur enough to be terribly useful anyway.

But what did predict a deadline in peril on a fully-remote project was the density of Slack communication. If there is intense back-and-forth communication with large numbers of messages being exchanged with short deadlines between the development team, then don't set your hopes on the next release being on time.

At the time, my interpretation of this was simply that there must be a lot of confusion about what's required and that a lot of communication is going on trying to resolve it.

But now I'm wondering: what if the Slack communication is not just a proxy for confusion? What if it is causative? When I couldn't interrupt myself, nor be interrupted, I was far more productive. Programming is a cerebral activity -- even more than writing -- usually done alone: is it better to take long periods of uninterrupted thought -- even if it means reducing communication with team mates?

This is not just an academic question. One of these two things must be true and the implication is obvious:

Instant messaging and communication is a net benefit to software development, or at least not particularly harmful. If we have to mandate instant messaging, should we?
Or, instant messaging and always-on communication in a development team is a bad idea. If we have to ban instant messaging from our dev teams in order to get the best out of them, should we?

So let me announce here a project: I would like to measure this. I would like to create a distraction index measure, derived from corporate emails, instant messaging, and so on. Then I'd like to see how well the different distractions correlate with late projects, delays, milestone slippages, etc.

If you are in a company that has a good number of software projects on the go, and it's worth a bit of money to you to know how to optimise your developer's time, I'd like to talk to you. I've got code to analyse a bunch of other useful predictors too -- such as the language used in git commit messages; the topics used in your JIRA tickets; and numerous others -- so whatever happens you'll get something worthwhile out of this.

Obviously, if you are working for Slack, or Atlassian or another instant messaging company, and you'd like to know your impact for better or for worse on your customers, I'd really like to talk to you as well.

Get in touch with me here: gregb@ifost.org.au

Friday, 10 March 2017

Short silly sci-fi story about drones, lockers and the future of prank freighting.

Google Docs version: http://x.ifost.org.au/wardrobe

Medium.com version: https://medium.com/@greg_baker/my-life-with-a-cupboard-in-an-age-of-drones-374852a27898#.3eb8zonei

Monday, 16 January 2017

Farewell to cold winters, and hello endless summer in Sydney

I was preparing some materials for this intro to python workshop and wanted to have some interesting data. We're having a heatwave in Sydney at the moment, where the temperatures at night are still stiflingly hot and sleeping is difficult. So I thought something about hot nights and memories of cold days might be nice.

Sydney "feels like" winter when the maximum temperature is below 20. It means you have the heater on (or it means I need to light a fire). It feels like summer when the minimum temperature is above 20; you have to sleep with at least a fan but you can dive into the pool or sea any time of the day or night and it isn't uncomfortable.

Here's how many winter-like days there were vs the number of summer-like days.

So is Sydney becoming the city of endless Summer? Who wants to take a bet on the first year when there are more summer-like days than winter-like days? I'm guessing around 2020.

I've saved the Jupyter notepad (including where to get the source data) here:
https://github.com/solresol/python-workshop/blob/master/curriculum/02-materials/code/farewell-winter-endless-summer.ipynb -- feel free to modify it and/or try it out for data in your city.

Other related sites

I lecture in natural language processing, and research non-manifold machine learning. I consult to companies needing help with technology management, AI strategy.
I have built a prediction market platform for enterprises to help with employee engagement and strategic decision making.
I wrote this NPS survey analyzer
As a build-a-business-in-one-day exercise, I have a GPT-via-email bot.
Conference call system to help when you need simultaneous translation done, but the only equipment you have on hand are people's mobile phones... which aren't all smartphones: church-translation.com
My Amazon author page
In the past I used to answer a lot of questions on Quora
Early 21st century pre-singularity geek-nerd poetry also available in book form

Search This Blog