Data science and deep learning in retail

[A version of this post appears on the O’Reilly Radar.]

The O’Reilly Data Show Podcast: Jeremy Stanley on hiring and leading machine learning engineers to build world-class data products.

Subscribe to the O’Reilly Data Show Podcast to explore the opportunities and techniques driving big data, data science, and AI. Find us on Stitcher, TuneIn, iTunes, SoundCloud, RSS.

In this episode of the Data Show, I spoke with Jeremy Stanley, VP of data science at Instacart, a popular grocery delivery service that is expanding rapidly. As Stanley describes it, Instacart operates a four-sided marketplace comprised of retail stores, products within the stores, shoppers assigned to the stores, and customers who order from Instacart. The objective is to get fresh groceries from popular retailers delivered to customers in a timely fashion. Instacart’s goals land them in the center of the many opportunities and challenges involved in building high-impact data products.

Retail produces some of the most interesting case studies involving the use of big data and machine learning. This observation holds true for companies worldwide: I’m seeing data products in retail in the U.S. and Europe, and some of the most exciting developments are happening in Asia. We covered the intersection of retail and logistics at a recent Strata Data conference, where we showcased the use of data and machine learning in transportation and logistics.

Here are some highlights from my conversation with Jeremy Stanley:

Using deep learning to assist Instacart shoppers

The application we have talked a lot about publicly is, how do you order the shopping list for the shopper once they show up in the store and they have to pick 30 or 40 or 50 different items?

The interesting challenge is that we don’t really know the exact location of items. Some of our retailers have great data about where their products are located, some of our retailers don’t. What we found works the best is to model the behavior of our shoppers. We’ve built a deep learning model that will take any order and the specific items in that order, the store location it’s going to be shopped at, and the specific shopper who’s going to shop it, and it will sort that order based upon that shopper’s behavior.

The challenge is that it’s not obvious how to solve this problem using something like XGBoost. You have a million different products, and tens of thousands of store locations, and tens of thousands of shoppers. There’s not an obvious way to represent those as features that are going to allow you to learn in an XGBoost model how to sort a list of specific items.

… The specific approach we use is to learn an embedding for the products, an embedding for the shoppers, an embedding for the stores, then combine all of those into a sequence of hidden layers in order to ultimately make a prediction: given a shopper just picked this specific item at this store location and they have available to them these X items left to pick, what’s the probability distribution over those next X items for the one they’re most likely to pick next? It turns out we can get that right about 60% of the time.

… Object or item recognition is not something we have in production yet, but we have built models to do that, and we’re working on the right way to utilize them in production. Ideally, what we would do is, every morning we would have one of the shoppers—or maybe the shift lead for the store—walk down the aisles capturing images or video of all of the items on the shelves. Then we would go through those and recognize each of the products and count how many of the products are on the shelves and use that to inform our understanding of the inventory of the store. We’d also like to identify the exact location of the shopper for every one of these images, so that we can then identify the location of the item and route the shopper more intelligently.

Hiring machine learning engineers

Machine learning engineers are stronger coders, and they have a passion for making products more intelligent. They think holistically about everything from logging and infrastructure, to algorithm experimentation and evolution, to the A/B testing infrastructure, and evolving the sophistication over time.

For machine learning engineers, we don’t hire inexperienced people. We tend to hire more experienced folks who have three to five years of experience. A part of that is because they’re going to be directly integrated into a team that’s a combination of engineers and machine learning engineers. They need to be able to hit the ground running relatively quickly. They don’t have to have the domain expertise. They can pick that up on the job. They need to have a lot of the best practices from coding and working with data, and a pretty deep understanding of machine learning and all of the fundamentals behind it.

Related resources: