Using machine learning to monitor and optimize chatbots

[A version of this post appears on the O’Reilly Radar.]

The O’Reilly Data Show Podcast: Ofer Ronen on the current state of chatbots.

In this episode of the Data Show, I spoke with Ofer Ronen, GM of Chatbase, a startup housed within Google’s Area 120. With tools for building chatbots becoming accessible, conversational interfaces are becoming more prevalent. As Ronen highlights in our conversation, chatbots are already enabling companies to automate many routine tasks (mainly in customer interaction). We are still in the early days of chatbots, but if current trends persist, we’ll see bots deployed more widely and take on more complex tasks and interactions. Gartner recently predicted that by 2021, companies will spend more on bots and chatbots than mobile app development.

Like any other software application, as bots get deployed in real-world applications, companies will need tools to monitor their performance. For a single, simple chatbot, one can imagine developers manually monitoring log files for errors and problems. Things get harder as you scale to more bots and as the bots get increasingly more complex. As in the case of other machine learning applications, when companies start deploying many more chatbots, automated tools for monitoring and diagnostics become essential.

The good news is relevant tools are beginning to emerge. In this episode, Ronen describes a tool he helped build: Chatbase is a chatbot analytics and optimization service that leverages machine learning research and technologies developed at Google. In essence, Chatbase lets companies focus on building and deploying the best possible chatbots.

Here are some highlights from our conversation:

Democratization of tools for bot developers

It’s been hard to get the natural language processing to work well and to recognize all the different ways people might say the same thing. There’s been an explosion of tools that leverage machine learning and natural language processing (NLP) engines to make sense of all that’s being asked of bots. But with increased capacity and capability to process data, there’s now better third-party tools for any company to take advantage of and build a decent bot out of the box.

… I see three levels of bot builders out there. There’s the non-technical kind where marketing or sales might create a prototype using a user interface—like maybe Chatfuel, which requires no programming, and create a basic experience. Or they might even create some sort of decision tree bot that is not flexible, but is good for maybe basic lead-gen experiences. But they often can’t handle type-ins. It’s often button-based. So, that’s one level, the non-technical folks.

Then there are teams that have developers on staff. They’re not machine learning experts, but they’re developers that can use off-the-shelf natural language processing engines to extract meaning from messages sent by users. So, you’re extracting intents and entities and making sense of what’s coming at your bot without having to have machine learning expertise.

Finally, there are teams that have the machine learning experts. They might build their own NLP engine to give them more control over how it works. But often, that’s not needed if a third-party solution can serve most of your needs. But we do see some teams like that.

Popular use cases

We track tens and tens of thousands of bots each month with Chatbase, and what we see is that for large companies, they often start with customer support for at least two reasons. One is that the automation can save them some money, but also because chatbots enable them to create a more effective experience for their users. In fact, there’s a survey by Salesforce and a couple other companies that found that what people want from bots is quick 24/7 answers to simple questions.

We also see some lead-generation bots. Those are simpler to build and often just live on a website of a company to try to gather and qualify leads. They can, most of the time, do a decent job and do a little better than just a dumb form. They can be responsive as they collect information from the user. They provide a more interactive experience, and, in many cases, can be more delightful for the customer.

Common errors: Unsupported, misunderstood, missed

Unsupported requests are ones that you just haven’t built, like someone is asking a travel bot to upgrade their seat, and you just haven’t built that functionality. As a product manager, let’s say, running a bot project, you’d want to know the top-requested features. You can use machine learning to cluster the similar requests and show the product what to build next based on popularity.

Misunderstood is when the user asks for X, but the bot thinks they asked for Y. That can be a really annoying experience for the end user. You’re likely going to lose the end user in these situations. These happen more often when a bot is expanding its use cases, which are referred to as intent and handles. The more intents and handles, the more of an opportunity to mislabel and misunderstand the user. Chatbase looks for these mislabeled or misunderstood situations, and we suggest alternate intents. We suggest fixes.

The final one is missed requests. This is when you’ve built the functionality, you have an intent for a certain type of request, and maybe you recognize a hundred ways of asking for that specific intent. There’s maybe another hundred that you don’t recognize. So, instead of having, let’s say, a business analyst go through your logs to find the other hundred, we automatically detect these extra ways of saying the same thing and suggest those to you so you can fix the bot.

Related resources: