Time-turner: Strata San Jose 2017, day 2

There are so many good talks happening at the same time that it’s impossible to not miss out on good sessions. But imagine I had a time-turner necklace and could actually “attend” 3 (maybe 5) sessions happening at the same time. Taking into account my current personal interests and tastes, here’s how my day would look:

11 a.m.

11:50 a.m.

1:50 p.m.

2:40 p.m.

4:20 p.m.

Time-turner: Strata San Jose 2017, day 1

There are so many good talks happening at the same time that it’s impossible to not miss out on good sessions. But imagine I had a time-turner necklace and could actually “attend” 3 (maybe 5) sessions happening at the same time. Taking into account my current personal interests and tastes, here’s how my day would look:

11 a.m.

11:50 a.m.

1:50 p.m.

2:40 p.m.

4:20 p.m.

5:10 p.m.

The key to building deep learning solutions for large enterprises

[A version of this post appears on the O’Reilly Radar.]

The O’Reilly Data Show Podcast: Adam Gibson on the importance of ROI, integration, and the JVM.

Subscribe to the O’Reilly Data Show Podcast to explore the opportunities and techniques driving big data, data science, and AI. Find us on Stitcher, TuneIn, iTunes, SoundCloud, RSS.

As data scientists add deep learning to their arsenals, they need tools that integrate with existing platforms and frameworks. This is particularly important for those who work in large enterprises. In this episode of the Data Show, I spoke with Adam Gibson, co-founder and CTO of Skymind, and co-creator of Deeplearning4J (DL4J). Gibson has spent the last few years developing the DL4J library and community, while simultaneously building deep learning solutions and products for large enterprises.

Here are some highlights:

Continue reading

2017 will be the year the data science and big data community engage with AI technologies

[A version of this post appears on the O’Reilly Radar.]

The O’Reilly Data Show Podcast: A look at some trends we’re watching in 2017.

Subscribe to the O’Reilly Data Show Podcast to explore the opportunities and techniques driving big data, data science, and AI. Find us on Stitcher, TuneIn, iTunes, SoundCloud, RSS.

This episode consists of excerpts from a recent talk I gave at a conference commemorating the end of the UC Berkeley AMPLab project. This section pertained to some recent trends in Data and AI. For a complete list of trends we’re watching in 2017, as well as regular doses of highly curated resources, subscribe to our Data and AI newsletters.

As 2016 draws to a close, I see the big data and data science community beginning to engage with AI-related technologies, particularly deep learning. By early next year, there will be new tools that specifically cater to data scientists and data engineers who aren’t necessarily experts in these techniques. While the AI research community continues to tackle fundamental problems, these new sets of tools will make some recent breakthroughs in AI much more accessible and convenient to use for the data community.

Related resources:

Data is only as valuable as the decisions it enables

[A version of this post appears on the O’Reilly Radar.]

The O’Reilly Data Show Podcast: Ion Stoica on building intelligent and secure applications on live data.

Subscribe to the O’Reilly Data Show Podcast to explore the opportunities and techniques driving big data, data science, and AI. Find us on Stitcher, TuneIn, iTunes, SoundCloud, RSS.

In this episode I spoke with Ion Stoica, cofounder and chairman of Databricks. Stoica is also a professor of computer science at UC Berkeley, where he serves as director of the new RISE Lab (the successor to AMPLab). Fresh off the incredible success of AMPLab, RISE seeks to build tools and platforms that enable sophisticated real-time applications on live data, while maintaining strong security. As Stoica points out, users will increasingly expect security guarantees on systems that rely on online machine learning algorithms that make use of personal or proprietary data.

As with AMPLab, the goal is to build tools and platforms, while producing high-quality research in computer science and its applications to other disciplines. Below are highlights from our conversation:
Continue reading

Building the next-generation big data analytics stack

[A version of this post appears on the O’Reilly Radar.]

The O’Reilly Data Show Podcast: Michael Franklin on the lasting legacy of AMPLab.

Subscribe to the O’Reilly Data Show Podcast to explore the opportunities and techniques driving big data, data science, and AI. Find us on Stitcher, TuneIn, iTunes, SoundCloud, RSS.

In this episode I spoke with Michael Franklin, co-director of UC Berkeley’s AMPLab and chair of the Department of Computer Science at the University of Chicago. AMPLab is well-known in the data community for having originated Apache Spark, Alluxio (formerly Tachyon) and many other open source tools. Today marks the start of a two-day symposium commemorating the end of AMPLab, and we took the opportunity to reflect on its impressive accomplishments.

AMPLab is the latest in a series of UC Berkeley research labs each designed with clear goals, a multidisciplinary faculty, and a fixed timeline (for more details, see David Patterson’s interesting design document for research labs). Many of AMPLab’s principals were involved in its precursor, the RAD Lab. As Franklin describes in our podcast episode:

The insight that Dave Patterson and the other folks who founded the RAD Lab had was that modern systems were so complex that you needed serious machine learning—cutting-edge machine learning—to be able to do that [to basically allow the systems to manage themselves]. You couldn’t take a computer systems person, give them an intro to machine learning book, and hope to solve that problem. They actually built this team that included computer systems people sitting next to machine learning people. … Traditionally, these two groups had very little to do with each other. That was a five-year project. The way I like to say it is—they spent at least four of those years learning how to talk to each other.

Toward of the end of the RAD Lab, we had probably the best group in the world of combined systems and machine learning people, who actually could speak to each other. In fact, Spark grew out of that relationship, because there were machine learning people in the RAD Lab who were trying to run iterative algorithms on Hadoop and were just getting terrible performance.

… AMPLab in some sense was a flip of that relationship. If you considered RAD Lab as basically a setting where “machine learning people were consulting for the systems people”, in AMPLab, we did the opposite—machine learning people got help from the systems people in how to make these things scale. That’s one part of the story.

In the rest of this post, I’ll describe some of my interactions with the AMPLab team. These recollections are based on early meetups, retreats, and conferences.

Continue reading

Why businesses should pay attention to deep learning

[A version of this post appears on the O’Reilly Radar.]

The O’Reilly Data Show Podcast: Christopher Nguyen on the early days of Apache Spark, deep learning for time-series and transactional data, innovation in China, and AI.

Subscribe to the O’Reilly Data Show Podcast to explore the opportunities and techniques driving big data, data science, and AI. Find us on Stitcher, TuneIn, iTunes, SoundCloud, RSS.

In this episode of the O’Reilly Data Show, I spoke with Christopher Nguyen, CEO and co-founder of Arimo. Nguyen and Arimo were among the first adopters and proponents of Apache Spark, Alluxio, and other open source technologies. Most recently, Arimo’s suite of analytic products has relied on deep learning to address a range of business problems.

Continue reading