The O’Reilly Data Show podcast: Eric Colson on algorithms, human computation, and building data science teams.
[A version of this post appears on the O’Reilly Radar.]
Subscribe to the O’Reilly Data Show Podcast to explore the opportunities and techniques driving big data and data science.
In this episode of the O’Reilly Data Show, I spoke with Eric Colson, chief algorithms officer at Stitch Fix, and former VP of data science and engineering at Netflix. We talked about building and deploying mission-critical, human-in-the-loop systems for consumer Internet companies. Knowing that many companies are grappling with incorporating data science, I also asked Colson to share his experiences building, managing, and nurturing, large data science teams at both Netflix and Stitch Fix.
Augmented systems: “Active learning,” “human-in-the-loop,” and “human computation”
We use the term ‘human computation’ at Stitch Fix. We have a team dedicated to human computation. It’s a little bit coarse to say it that way because we do have more than 2,000 stylists, and these are very much human beings that are very passionate about fashion styling. What we can do is, we can abstract their talent into—you can think of it like an API; there’s certain tasks that only a human can do or we’re going to fail if we try this with machines, so we almost have programmatic access to human talent. We are allowed to route certain tasks to them, things that we could never get done with machines.
… We have some of our own proprietary software that blends together two resources: machine learning and expert human judgment. The way I talk about it is, we have an algorithm that’s distributed across the resources. It’s a single algorithm, but it does some of the work through machine resources, and other parts of the work get done through humans.
… You can think of even the classic recommender systems, collaborative filtering, which people recognize as, ‘people that bought this also bought that.’ Those things break down to nothing more than a series of rote calculations. Being a human, you can actually do them by hand—it’ll just take you a long time, and you’ll make a lot of mistakes along the way, and you’re not going to have much fun doing it—but machines can do this stuff in milliseconds. They can find these hidden relationships within the data that are going to help figure out what’s relevant to certain consumer’s preferences and be able to recommend things. Those are things that, again, a human could, in theory, do, but they’re just not great at all the calculations, and every algorithmic technique breaks down to a series of rote calculations.
… What machines can’t do are things around cognition, things that have to do with ambient information, or appreciation of aesthetics, or even the ability to relate to another human—those things are strictly in the purview of humans. Those types of tasks we route over to stylists. … I would argue that our humans could not do their jobs without the machines. We keep our inventory very large so that there are always many things to pick from for any given customer. It’s so large, in fact, that it would take a human too long to sift through it on her own, so what machines are doing is narrowing down the focus.
Combining art and science
Our business model is different. We are betting big on algorithms. We do not have the barriers to competition that other retailers have, like Wal-Mart has economies of scale that allow them to do amazing things; that’s their big barrier. … What is our protective barrier? It’s [to be the] best in the world at algorithms. We have to be the very best. … More than any other company, we are going to suffer if we’re wrong.
… Our founder wanted to do this from the very beginning, combine empiricism with what can’t be captured in data, call it intuition or judgment. But she really wanted to weave those two things together to produce something that was better than either can do on their own. She calls it art and science, combining art and science.
Defining roles in data science teams
[Job roles at StitchFix are] built on three premises that come from Dan Pink’s book Drive. Autonomy, mastery, purpose—those are the fundamental things you need to have for high job satisfaction. With autonomy, that’s why we dedicate them to a team. You’re going to now work on what’s called ‘marketing algorithms.’ You may not know anything about marketing to begin with, but you’re going to learn it pretty fast. You’re going to pick up the domain expertise. By autonomy, we want you to do the whole thing so you have the full context. You’re going to be the one sourcing the data, building pipelines. You’re going to be applying the algorithmic routine. You’re going to be the one who frames that problem, figures out what algorithms you need, and you’re going to be the one delivering the output and connecting it back to some action, whatever that action may be. Maybe it’s adjusting our multi-channel strategy. Whatever that algorithmic output is, you’re responsible for it. So, that’s mastery. Now, you’re autonomous because you do all the pieces. You’re getting mastery over one domain, in that case, say marketing algorithms. You’re going to be looked at as you’re the best person in the company to go talk about how these things work; you know the end-to-end.
Then, purpose—that’s the impact that you’re going to make. In the case that we gave, marketing algorithms, you want to be accountable. You want to be the one who can move the needle when it comes to how much we should do. What channels are more effective at acquiring new customers? Whatever it is, you’re going to be held accountable for a real number, and that is motivating, that’s what makes people love their jobs.
Subscribe to the O’Reilly Data Show Podcast: Stitcher, TuneIn, iTunes, SoundCloud, RSS
Editor’s note: Eric Colson will speak about augmenting machine learning with human computation for better personalization, at Strata + Hadoop World in San Jose this March.