Language understanding remains one of AI’s grand challenges

[A version of this post appears on the O’Reilly Radar.]

The O’Reilly Data Show Podcast: David Ferrucci on the evolution of AI systems for language understanding.

Subscribe to the O’Reilly Data Show Podcast to explore the opportunities and techniques driving big data, data science, and AI. Find us on Stitcher, TuneIn, iTunes, SoundCloud, RSS.

In this episode of the Data Show, I spoke with David Ferrucci, founder of Elemental Cognition and senior technologist at Bridgewater Associates. Ferrucci served as principal investigator of IBM’s DeepQA project and led the Watson team that became champion of the Jeopardy! quiz show. Elemental Cognition (EC) is a research group focused on building an AI system that will be equipped with state-of-the-art natural language understanding technologies. Ferrucci envisions that EC will ship with foundational knowledge in many subject areas, but will be able to very quickly acquire knowledge in other (specialized) domains with the help of “human mentors.”

Having built and deployed several prominent AI systems through the years, I also wanted to get Ferrucci’s perspective on the evolution of AI technologies, and how enterprises can take advantage of all the exciting recent developments.

Here are some highlights from our conversation:
Continue reading “Language understanding remains one of AI’s grand challenges”

Data preparation in the age of deep learning

[A version of this post appears on the O’Reilly Radar.]

The O’Reilly Data Show Podcast: Lukas Biewald on why companies are spending millions of dollars on labeled data sets.

Subscribe to the O’Reilly Data Show Podcast to explore the opportunities and techniques driving big data, data science, and AI. Find us on Stitcher, TuneIn, iTunes, SoundCloud, RSS.

In this episode of the Data Show, I spoke with Lukas Biewald, co-founder and chief data scientist at CrowdFlower. In a previous episode we covered how the rise of deep learning is fueling the need for large labeled data sets and high-performance computing systems. CrowdFlower has a service that many leading companies have come to rely on to provide them with labeled data sets to train machine learning models. As deep learning models get larger and more complex, they require training data sets that are bigger than those required by other machine learning techniques.

The CrowdFlower platform combines the contributions of human workers and algorithms. Through a process called active learning, they send difficult tasks or edge cases to humans, and they let the algorithms handle the more routine examples. But, how do you decide when to use human workers? In a simple example involving building an automatic classifier, you will probably want to send human workers cases when your machine learning algorithms signal uncertainty (probability scores are on the low side) or when your ensemble of machine learning algorithms signals disagreement. As Biewald describes in our conversation, active learning is much more subtle, and the CrowdFlower platform, in particular, is able to combine humans and algorithms to handle more sophisticated tasks.

Here are some highlights from our conversation:
Continue reading “Data preparation in the age of deep learning”