Beyond building training sets for machine-learning, crowdsourcing is being used to enhance the results of machine-learning models: in active learning, humans take care of uncertain cases, models handle the routine ones. Active Learning is one of those topics that many data scientists have heard of, few have tried, and a small handful know how to do well. As data problems increase in complexity, I think active learning is a topic that many more data scientists need to familiarize themselves with.
Machine learning research is often not applied to real world situations. Often the improvements are small and the increased complexity is high, so except in special situations, industry doesn’t take advantage of advances in the academic literature.
Active learning is an example where research proposes a simple strategy that makes a huge difference and almost everyone applying machine learning to real world use cases is doing it or should be doing it. Active learning is the practice of taking cases where the model has low confidence, getting them labeled, and then using those labels as input data.
Webcast attendees will learn simple, practical ways to improve their models by cleaning up and tweaking the distribution of their training data. They will also learn about best practices from real world cases where active learning and data selection took models that were completely unusable in production to extremely effective.