Practical applications of reinforcement learning in industry

[A version of this post appears on the O’Reilly Radar.]

An overview of commercial and industrial applications of reinforcement learning.

The flurry of headlines surrounding AlphaGo Zero (the most recent version of DeepMind’s AI system for playing Go) means interest in reinforcement learning (RL) is bound to increase. Next to deep learning, RL is among the most followed topics in AI. For most companies, RL is something to investigate and evaluate but few organizations have identified use cases where RL may play a role. As we enter 2018, I want to briefly describe areas where RL has been applied.

RL is confusingly used to refer to a set of problems and a set of techniques, so let’s first settle on what RL will mean for the rest of this post. Generally speaking, the goal in RL is learning how to map observations and measurements to a set of actions while trying to maximize some long-term reward. This usually involves applications where an agent interacts with an environment while trying to learn optimal sequences of decisions. In fact, many of the initial applications of RL are in areas where automating sequential decision-making have long been sought. RL poses a different set of challenges from traditional online learning, in that you often have some combination of delayed feedback, sparse rewards, and (most importantly) the agents in question are often able to affect the environments with which they interact.

Deep learning as a machine learning technique is beginning to be used by companies on a variety of machine learning applications. RL hasn’t quite found its way into many companies, and my goal is to sketch out some of the areas where applications are appearing.

Slide courtesy of Ben Lorica.

Before I do so, let me start off by listing some of the challenges facing RL in the enterprise. As Andrew Ng noted in his keynote at our AI Conference in San Francisco, RL requires a lot of data, and as such, it has often been associated with domains where simulated data is available (gameplay, robotics). It also isn’t easy to take results from research papers and implement them in applications. Reproducing research results can be challenging even for RL researchers, let alone regular data scientists (see this recent paper and this OpenAI blog post). As machine learning gets deployed in mission-critical situations, reproducibility and the ability to estimate error become essential. So, at least for now, RL may not be ideal for mission-critical applications that require continuous control.

AI notwithstanding, there are already interesting applications and products that rely on RL. There are many settings involving personalization, or the automation of well-defined tasks, that would benefit from sequential decision-making that RL can help automate (or at least, where RL can augment a human expert). The key for companies is to start with simple uses cases that fit this profile rather than overly complicated problems that “require AI.” To make things more concrete, let me highlight some of the key application domains where RL is beginning to appear.

Robotics and industrial automation

Applications of RL in high-dimensional control problems, like robotics, have been the subject of research (in academia and industry), and startups are beginning to use RL to build products for industrial robotics.

Industrial automation is another promising area. It appears that RL technologies from DeepMind helped Google significantly reduce energy consumption (HVAC) in its own data centers. Startups have noticed there is a large market for automation solutions. Bonsai is one of several startups building tools to enable companies to use RL and other techniques for industrial applications. A common example is the use of AI for tuning machines and equipment where expert human operators are currently being used.

Figure 2. Slide from Mark Hammond, used with permission.

With industrial systems in mind, Bonsai recently listed the following criteria for when RL might be useful to consider:

  • You’re using simulations because your system or process is too complex (or too physically hazardous) for teaching machines through trial and error.
  • You’re dealing with large state spaces.
  • You’re seeking to augment human analysts and domain experts by optimizing operational efficiency and providing decision support.

Data science and machine learning

Machine learning libraries have gotten easier to use, but choosing a proper model or model architecture can still be challenging for data scientists. With deep learning becoming a technique used by data scientists and machine learning engineers, tools that can help people identify and tune neural network architectures are active areas of research. Several research groups have proposed using RL to make the process of designing neural network architectures more accessible (MetaQNN from MIT and Net2Net operations). AutoML from Google uses RL to produce state-of-the-art machine-generated neural network architectures for computer vision and language modeling.

Looking beyond tools that simplify the creation of machine learning models, there are some who think that RL will prove useful in assisting software engineers write computer programs.

Education and training

Online platforms are beginning to experiment with using machine learning to create personalized experiences. Several researchers are investigating the use of RL and other machine learning methods in tutoring systems and personalized learning. The use of RL can lead to training systems that provide custom instruction and materials tuned to the needs of individual students. A group of researchers is developing RL algorithms and statistical methods that require less data for use in future tutoring systems.

Health and medicine

The RL setup of an agent interacting with an environment receiving feedback based on actions taken shares similarities with the problem of learning treatment policies in the medical sciences. In fact, many RL applications in health care mostly pertain to finding optimal treatment policies. Recent papers cited applications of RL to usage of medical equipment, medication dosing, and two-stage clinical trials.

Text, speech, and dialog systems

Companies collect a lot of text, and good tools that can help unlock unstructured text will find users. Earlier this year, AI researchers at SalesForce used deep RL for abstractive text summarization (a technique for automatically generating summaries from text based on content “abstracted” from some original text document). This could be an area where RL-based tools gain new users, as many companies are in need of better text mining solutions.

RL is also being used to allow dialog systems (i.e., chatbots) to learn from user interactions and thus help them improve over time (many enterprise chatbots currently rely on decision trees). This is an active area of research and VC investments: see Semantic Machines and VocalIQ—acquired by Apple.

Media and advertising

Microsoft recently described an internal system called Decision Service that has since been made available on Azure. This paper describes applications of Decision Service to content recommendation and advertising. Decision Service more generally targets machine learning products that suffer from failure modes including “feedback loops and bias, distributed data collection, changes in the environment, and weak monitoring and debugging.”

Other applications of RL include cross-channel marketing optimization and real time bidding systems for online display advertising.


Having started my career as a lead quant in a hedge fund, it didn’t surprise me that few finance companies are willing to talk on record. Generally speaking, I came across quants and traders who were evaluating deep learning and RL but haven’t found sufficient reason to use the tools beyond small pilots. While potential applications in finance are described in research papers, few companies describe software in production.

One exception is a system used for trade execution at JPMorgan Chase. A Financial Times article described an RL-based system for optimal trade execution. The system (dubbed “LOXM”) is being used to execute trading orders at maximum speed and at the best possible price.

As with any new technique or technology, the key to using RL is to understand its strengths and weaknesses, and then find simple use cases on which to try it. Resist the hype around AI—rather, consider RL as a useful machine learning technique, albeit one that is best suited for a specific class of problems. We are just beginning to see RL in enterprise applications. Along with continued research into algorithms, many software tools (libraries, simulators, distributed computation frameworks like Ray, SaaS) are beginning to appear. But it’s fair to say that few of these tools come with examples aimed at users interested in industry applications. There are, however, already a few startups that are incorporating RL into their products. So, before you know it, you might soon be benefiting from developments in RL and related techniques.

Related resources: