2017 will be the year the data science and big data community engage with AI technologies

[A version of this post appears on the O’Reilly Radar.]

The O’Reilly Data Show Podcast: A look at some trends we’re watching in 2017.

Subscribe to the O’Reilly Data Show Podcast to explore the opportunities and techniques driving big data, data science, and AI. Find us on Stitcher, TuneIn, iTunes, SoundCloud, RSS.

This episode consists of excerpts from a recent talk I gave at a conference commemorating the end of the UC Berkeley AMPLab project. This section pertained to some recent trends in Data and AI. For a complete list of trends we’re watching in 2017, as well as regular doses of highly curated resources, subscribe to our Data and AI newsletters.

As 2016 draws to a close, I see the big data and data science community beginning to engage with AI-related technologies, particularly deep learning. By early next year, there will be new tools that specifically cater to data scientists and data engineers who aren’t necessarily experts in these techniques. While the AI research community continues to tackle fundamental problems, these new sets of tools will make some recent breakthroughs in AI much more accessible and convenient to use for the data community.

Related resources:

Data is only as valuable as the decisions it enables

[A version of this post appears on the O’Reilly Radar.]

The O’Reilly Data Show Podcast: Ion Stoica on building intelligent and secure applications on live data.

Subscribe to the O’Reilly Data Show Podcast to explore the opportunities and techniques driving big data, data science, and AI. Find us on Stitcher, TuneIn, iTunes, SoundCloud, RSS.

In this episode I spoke with Ion Stoica, cofounder and chairman of Databricks. Stoica is also a professor of computer science at UC Berkeley, where he serves as director of the new RISE Lab (the successor to AMPLab). Fresh off the incredible success of AMPLab, RISE seeks to build tools and platforms that enable sophisticated real-time applications on live data, while maintaining strong security. As Stoica points out, users will increasingly expect security guarantees on systems that rely on online machine learning algorithms that make use of personal or proprietary data.

As with AMPLab, the goal is to build tools and platforms, while producing high-quality research in computer science and its applications to other disciplines. Below are highlights from our conversation:
Continue reading “Data is only as valuable as the decisions it enables”

Hong Kong 2016

We just spent the past week in Hong Kong and below are some tweets from our trip. The weather was perfect – the temperature was just right. Their metro/train system is world class and easy to use and navigate. I’m looking to returning soon!

Wan Chai

We stayed in the vibrant Wan Chai district which among other things, has a very active market scene. This is a typical evening just off of the Wan Chai market, and right on Wan Chai road itself:

Introducing model-based thinking into AI systems

[A version of this post appears on the O’Reilly Radar.]

The O’Reilly Data Show Podcast: Vikash Mansinghka on recent developments in probabilistic programming.

Subscribe to the O’Reilly Data Show Podcast to explore the opportunities and techniques driving big data, data science, and AI. Find us on Stitcher, TuneIn, iTunes, SoundCloud, RSS.

In this episode I spoke with Vikash Mansinghka, research scientist at MIT, where he leads the Probabilistic Computing Project, and co-founder of Empirical Systems. I’ve long wanted to introduce listeners to recent developments in probabilistic programming, and I found the perfect guide in Mansinghka.

Probability is the mathematical language to represent, model, and manipulate uncertainty, and probabilistic programming provides frameworks for representing probabilistic models as computer programs. This family of tools and techniques distinguishes between models and the inference procedures, and in the process, encourages the kind of model-based thinking that may inform the design of future artificial intelligence systems and supplement current data and compute-intensive systems that rely primarily on large-scale pattern recognition.

Below are highlights from my conversation with Mansinghka:
Continue reading “Introducing model-based thinking into AI systems”