Time-turner: Strata San Jose 2017, day 2

There are so many good talks happening at the same time that it’s impossible to not miss out on good sessions. But imagine I had a time-turner necklace and could actually “attend” 3 (maybe 5) sessions happening at the same time. Taking into account my current personal interests and tastes, here’s how my day would look:

11 a.m.

11:50 a.m.

1:50 p.m.

2:40 p.m.

4:20 p.m.

Time-turner: Strata San Jose 2017, day 1

There are so many good talks happening at the same time that it’s impossible to not miss out on good sessions. But imagine I had a time-turner necklace and could actually “attend” 3 (maybe 5) sessions happening at the same time. Taking into account my current personal interests and tastes, here’s how my day would look:

11 a.m.

11:50 a.m.

1:50 p.m.

2:40 p.m.

4:20 p.m.

5:10 p.m.

Building the next-generation big data analytics stack

[A version of this post appears on the O’Reilly Radar.]

The O’Reilly Data Show Podcast: Michael Franklin on the lasting legacy of AMPLab.

Subscribe to the O’Reilly Data Show Podcast to explore the opportunities and techniques driving big data, data science, and AI. Find us on Stitcher, TuneIn, iTunes, SoundCloud, RSS.

In this episode I spoke with Michael Franklin, co-director of UC Berkeley’s AMPLab and chair of the Department of Computer Science at the University of Chicago. AMPLab is well-known in the data community for having originated Apache Spark, Alluxio (formerly Tachyon) and many other open source tools. Today marks the start of a two-day symposium commemorating the end of AMPLab, and we took the opportunity to reflect on its impressive accomplishments.

AMPLab is the latest in a series of UC Berkeley research labs each designed with clear goals, a multidisciplinary faculty, and a fixed timeline (for more details, see David Patterson’s interesting design document for research labs). Many of AMPLab’s principals were involved in its precursor, the RAD Lab. As Franklin describes in our podcast episode:

The insight that Dave Patterson and the other folks who founded the RAD Lab had was that modern systems were so complex that you needed serious machine learning—cutting-edge machine learning—to be able to do that [to basically allow the systems to manage themselves]. You couldn’t take a computer systems person, give them an intro to machine learning book, and hope to solve that problem. They actually built this team that included computer systems people sitting next to machine learning people. … Traditionally, these two groups had very little to do with each other. That was a five-year project. The way I like to say it is—they spent at least four of those years learning how to talk to each other.

Toward of the end of the RAD Lab, we had probably the best group in the world of combined systems and machine learning people, who actually could speak to each other. In fact, Spark grew out of that relationship, because there were machine learning people in the RAD Lab who were trying to run iterative algorithms on Hadoop and were just getting terrible performance.

… AMPLab in some sense was a flip of that relationship. If you considered RAD Lab as basically a setting where “machine learning people were consulting for the systems people”, in AMPLab, we did the opposite—machine learning people got help from the systems people in how to make these things scale. That’s one part of the story.

In the rest of this post, I’ll describe some of my interactions with the AMPLab team. These recollections are based on early meetups, retreats, and conferences.

Continue reading

Strata NYC 2016 is next week

Complete schedule is HERE.

Hardcore Data Science, London 2016

The first edition of Hardcore Data Science in Europe featured 12 outstanding sessions. My co-organizer, Angie Ma of ASI deserves much of the credit for recruiting many of the speakers. We had talks on machine learning techniques (deep / transfer / reinforcement / ensemble / semisupervised) applied to a variety of data sets (images, text, video, event data) and settings (advertising, media, ecommerce, health, environment, finance, etc.). A summary can be found below.