When I wrote about enterprise applications of reinforcement learning (RL) a little over a year ago, I cited a few examples of applications for recommenders and personalization systems. At the time, the examples I listed came from large technology companies, specifically Netflix, JD, Facebook, and YouTube, as only large companies tended to have the resources to utilize RL. Fast-forward a year later, RL is still not a widely used or accessible technology. Our recent analysis of Fortune 1000* companies revealed that, compared to other techniques (like deep learning), engagement in RL is very much still in the early stages across all sectors:

This isn’t too surprising. Compared to traditional supervised and unsupervised learning methods, RL is much more difficult to implement and maintain. The challenges I listed last year remain relevant:
- Lack of detailed tutorials that target enterprise use cases.
- The burden of building real-world simulators or mechanisms for training off-line from logs.
- Challenges when it comes to explainability and reproducibility.
- Computational inefficiency.
- High-dimensional state and action spaces.
- Using RL safely and effectively in production can be difficult. RL introduces additional algorithmic and system complexities that go beyond supervised machine learning. It is also difficult to test, ensure the reliability of, and improve RL systems that run within live systems that cannot be effectively simulated (e.g., recommenders). The difficulty of ensuring reliability is also shared in settings where simulation is possible.”
When you examine applications of RL across many domains, it becomes apparent that certain ingredients have to be in place for RL to be a good fit. Problems where RL solutions have proven useful tend to require highly complex optimizations that involve a sequence of actions or decisions. RL has proven to be a good fit for uses cases where your “reward” is delayed—problems where you only learn the consequences of your actions further down the road. You usually need to have good simulators in place as well—or you need a viable strategy for training off-line using historical logs.
RL in the Fortune 1000*

Let’s look at some recent examples of RL in the real world. The list below includes representative examples of how some Fortune 1000 companies are beginning to use RL and related tools. While the list mostly includes companies outside the “Technology” sector, there are a few included tech companies that are using RL for chip design.
Financial Markets: RBC Capital Markets rolled out a new trading platform called Aiden®, which is reputed to use “deep reinforcement learning in a constantly changing environment like equities trading, with measurable and explainable results for its users.” RBC claims the platform is able to “execute trading decisions based on live market data, dynamically adjust to new information and learn from each of its previous actions.” J.P. Morgan uses RL to enhance its Foreign Exchange (FX) pricing and trading algorithms. As I mentioned, RL is actually well-suited for complex optimizations involving sequences of actions where the payoff is delayed well into the future. It won’t be a shock if RL becomes common in these types of financial applications.
Personalization and Recommenders: While these class of applications were covered in a previous post, this time I’m highlighting a few examples of companies that are outside of the “Technology” sector. Starbucks claims it uses RL to increase engagement on its mobile app. Cisco has hinted that they use RL (alongside other statistical and ML techniques) to improve customer-facing services and increase operational efficiency. These next examples were gleaned from job postings (for RL in job posts, also see here and here). Nike recently filled roles that required RL skills to help build personalization models. Online retailer Wayfair is looking for a principal data scientist (deep reinforcement learning) focused on research and implementation of models that cut across their entire stack of recommenders.
Security and Asset Management: Cybersecurity vendor Fortinet uses RL as part of their core product offerings. GE has a three-year US government contract to work on automated building management systems that use RL to allow it to “distinguish between regular system faults and cyberattacks, and take action to protect systems.” In recent years, supervised machine learning has been used to manage data and software infrastructure; we are beginning to see RL used in these areas as well. For example, Dell has been using RL in its quest to build storage solutions that can automatically change behavior in response to changing workloads.
Simulations, specifically for chip design: In a previous post, I described startups building tools that can be plugged into existing simulation and optimization software. More recently, Pathmind has worked on Industrial AI use cases involving the use of RL for large-scale simulations and optimizations. Building on the theme of simulations, semiconductor designers have long used simulation engines as part of their design process. There are several examples of technology companies that are beginning to use RL to design semiconductors. At last year’s Ray Summit, Google described how they use RL for a series of tasks in chip design. Synopsys has an RL-powered offering called DSO.ai that is purportedly “capable of searching for optimization targets in very large solution spaces of chip design.” Verification tools are used by system-on-a-chip designers to test and verify designs before tapeout and manufacturing. In 2019, Cadence introduced a Formal Verification product called JasperGold that uses RL to increase throughput and expand coverage.
(Bonus) RL for industrial design: The America’s Cup sailing competition is one of the most prestigious sporting events in the world. Teams spend years designing boats, and realistic simulators have become integral to the design process. For the 2021 America’s Cup, a team from QuantumBlack (which is part of McKinsey) used RL to help design the boat used by the winning team. Their work was critical in testing various designs of hydrofoils: an important component that could be modified based on rules set forth by the organizers. More precisely the QuantumBlack team designed an AI agent that could learn to sail the boat for a given design at an optimal speed; this AI agent proved crucial during the design process.
Subscribe to our Newsletter: We also publish a popular newsletter where we share highlights from recent podcast episodes, trends in AI / machine learning / data, and a collection of recommendations.
(*) Fortune 1000 is a trademark of Fortune Media IP Limited. Image by gradientflow.com