Building human-assisted AI applications

[A version of this post appears on the O’Reilly Radar.]

The O’Reilly Data Show Podcast: Adam Marcus on intelligent systems and human-in-the-loop computing.

Subscribe to the O’Reilly Data Show Podcast to explore the opportunities and techniques driving big data, data science, and AI. Find us on Stitcher, TuneIn, iTunes, SoundCloud, RSS.

In this episode of the O’Reilly Data Show, I spoke with Adam Marcus, co-founder and CTO of B12, a startup focused on building human-in-the-loop intelligent applications. We talked about the open source platform Orchestra,for coordinating human-in-the-loop projects; the current wave of human-assisted AI applications; best practices for reviewing and scoring experts; and flash teams.

Here are some highlights from our conversation:

Continue reading

Building a business that combines human experts and data science

The O’Reilly Data Show podcast: Eric Colson on algorithms, human computation, and building data science teams.

[A version of this post appears on the O’Reilly Radar.]

Subscribe to the O’Reilly Data Show Podcast to explore the opportunities and techniques driving big data and data science.

In this episode of the O’Reilly Data Show, I spoke with Eric Colson, chief algorithms officer at Stitch Fix, and former VP of data science and engineering at Netflix. We talked about building and deploying mission-critical, human-in-the-loop systems for consumer Internet companies. Knowing that many companies are grappling with incorporating data science, I also asked Colson to share his experiences building, managing, and nurturing, large data science teams at both Netflix and Stitch Fix.

Augmented systems: “Active learning,” “human-in-the-loop,” and “human computation”

We use the term ‘human computation’ at Stitch Fix. We have a team dedicated to human computation. It’s a little bit coarse to say it that way because we do have more than 2,000 stylists, and these are very much human beings that are very passionate about fashion styling. What we can do is, we can abstract their talent into—you can think of it like an API; there’s certain tasks that only a human can do or we’re going to fail if we try this with machines, so we almost have programmatic access to human talent. We are allowed to route certain tasks to them, things that we could never get done with machines.

… We have some of our own proprietary software that blends together two resources: machine learning and expert human judgment. The way I talk about it is, we have an algorithm that’s distributed across the resources. It’s a single algorithm, but it does some of the work through machine resources, and other parts of the work get done through humans.

… You can think of even the classic recommender systems, collaborative filtering, which people recognize as, ‘people that bought this also bought that.’ Those things break down to nothing more than a series of rote calculations. Being a human, you can actually do them by hand—it’ll just take you a long time, and you’ll make a lot of mistakes along the way, and you’re not going to have much fun doing it—but machines can do this stuff in milliseconds. They can find these hidden relationships within the data that are going to help figure out what’s relevant to certain consumer’s preferences and be able to recommend things. Those are things that, again, a human could, in theory, do, but they’re just not great at all the calculations, and every algorithmic technique breaks down to a series of rote calculations.

… What machines can’t do are things around cognition, things that have to do with ambient information, or appreciation of aesthetics, or even the ability to relate to another human—those things are strictly in the purview of humans. Those types of tasks we route over to stylists. … I would argue that our humans could not do their jobs without the machines. We keep our inventory very large so that there are always many things to pick from for any given customer. It’s so large, in fact, that it would take a human too long to sift through it on her own, so what machines are doing is narrowing down the focus.

Combining art and science

Our business model is different. We are betting big on algorithms. We do not have the barriers to competition that other retailers have, like Wal-Mart has economies of scale that allow them to do amazing things; that’s their big barrier. … What is our protective barrier? It’s [to be the] best in the world at algorithms. We have to be the very best. … More than any other company, we are going to suffer if we’re wrong.

… Our founder wanted to do this from the very beginning, combine empiricism with what can’t be captured in data, call it intuition or judgment. But she really wanted to weave those two things together to produce something that was better than either can do on their own. She calls it art and science, combining art and science.

Defining roles in data science teams

[Job roles at StitchFix are] built on three premises that come from Dan Pink’s book Drive. Autonomy, mastery, purpose—those are the fundamental things you need to have for high job satisfaction. With autonomy, that’s why we dedicate them to a team. You’re going to now work on what’s called ‘marketing algorithms.’ You may not know anything about marketing to begin with, but you’re going to learn it pretty fast. You’re going to pick up the domain expertise. By autonomy, we want you to do the whole thing so you have the full context. You’re going to be the one sourcing the data, building pipelines. You’re going to be applying the algorithmic routine. You’re going to be the one who frames that problem, figures out what algorithms you need, and you’re going to be the one delivering the output and connecting it back to some action, whatever that action may be. Maybe it’s adjusting our multi-channel strategy. Whatever that algorithmic output is, you’re responsible for it. So, that’s mastery. Now, you’re autonomous because you do all the pieces. You’re getting mastery over one domain, in that case, say marketing algorithms. You’re going to be looked at as you’re the best person in the company to go talk about how these things work; you know the end-to-end.

Then, purpose—that’s the impact that you’re going to make. In the case that we gave, marketing algorithms, you want to be accountable. You want to be the one who can move the needle when it comes to how much we should do. What channels are more effective at acquiring new customers? Whatever it is, you’re going to be held accountable for a real number, and that is motivating, that’s what makes people love their jobs.

Subscribe to the O’Reilly Data Show Podcast: Stitcher, TuneIn, iTunes, SoundCloud, RSS

Editor’s note: Eric Colson will speak about augmenting machine learning with human computation for better personalization, at Strata + Hadoop World in San Jose this March.

Related resources:

 

Real-world Active Learning

Beyond building training sets for machine-learning, crowdsourcing is being used to enhance the results of machine-learning models: in active learning, humans take care of uncertain cases, models handle the routine ones. Active Learning is one of those topics that many data scientists have heard of, few have tried, and a small handful know how to do well. As data problems increase in complexity, I think active learning is a topic that many more data scientists need to familiarize themselves with.

Next week I’ll be hosting a FREE webcast on Active Learning featuring data scientist and entrepreneur Lukas Biewald:

Machine learning research is often not applied to real world situations. Often the improvements are small and the increased complexity is high, so except in special situations, industry doesn’t take advantage of advances in the academic literature.

Active learning is an example where research proposes a simple strategy that makes a huge difference and almost everyone applying machine learning to real world use cases is doing it or should be doing it. Active learning is the practice of taking cases where the model has low confidence, getting them labeled, and then using those labels as input data.

Webcast attendees will learn simple, practical ways to improve their models by cleaning up and tweaking the distribution of their training data. They will also learn about best practices from real world cases where active learning and data selection took models that were completely unusable in production to extremely effective.

Crowdsourcing Feature discovery

More than algorithms, companies gain access to models that incorporate ideas generated by teams of data scientists

[A version of this post appears on the O’Reilly Data blog and Forbes.]

Data scientists were among the earliest and most enthusiastic users of crowdsourcing services. Lukas Biewald noted in a recent talk that one of the reasons he started CrowdFlower was that as a data scientist he got frustrated with having to create training sets for many of the problems he faced. More recently, companies have been experimenting with active learning (humans1 take care of uncertain cases, models handle the routine ones). Along those lines, Adam Marcus described in detail how Locu uses Crowdsourcing services to perform structured extraction (converting semi/unstructured data into structured data).

Another area where crowdsourcing is popping up is feature engineering and feature discovery. Experienced data scientists will attest that generating features is as (if not more) important than choice of algorithm. Startup CrowdAnalytix uses public/open data sets to help companies enhance their analytic models. The company has access to several thousand data scientists spread across 50 countries and counts a major social network among its customers. Its current focus is on providing “enterprise risk quantification services to Fortune 1000 companies”.

CrowdAnalytix breaks up projects in two phases: feature engineering and modeling. During the feature engineering phase, data scientists are presented with a problem (independent variable(s)) and are asked to propose features (predictors) and brief explanations for why they might prove useful. A panel of judges evaluate2 features based on the accompanying evidence and explanations. Typically 100+ teams enter this phase of the project, and 30+ teams propose reasonable features.

Continue reading