Building big data systems in academia and industry

[A version of this post appears on the O’Reilly Radar blog.] The O’Reilly Data Show Podcast: Mikio Braun on stream processing, academic research, and training. Mikio Braun is a machine learning researcher who also enjoys software engineering. We first met when he co-founded a real-time analytics company called streamdrill. Since then, I’ve always had greatContinue reading “Building big data systems in academia and industry”

A real-time processing revival

[A version of this post appears on the O’Reilly Radar blog.] Things are moving fast in the stream processing world. There’s renewed interest in stream processing and analytics. I write this based on some data points (attendance in webcasts and conference sessions; a recent meetup), and many conversations with technologists, startup founders, and investors. Certainly,Continue reading “A real-time processing revival”

Scikit-Learn 0.16

I’ll be hosting a webcast featuring two of the key contributors to what is arguably one of the most popular machine learning tools today – scikit-learn: News from Scikit-Learn 0.16 and Soon-To-Be Gems for the Next Release presented by: Olivier Grisel, Andreas Mueller This webcast will review Scikit-learn, a widely used open source machine learningContinue reading “Scikit-Learn 0.16”

Apache Spark 1.3, the new Dataframe API, and Spark performance

Over the course of a week, I’ll be hosting two good webcasts featuring Spark release manager Patrick Wendell and Spark committer Kay Ousterhout. Register now! Patrick Wendell: Spark 1.3 and Spark’s New Dataframe API (March 25th at 9 a.m. California time) In this webcast, Patrick Wendell from Databricks will be speaking about Spark’s new 1.3Continue reading “Apache Spark 1.3, the new Dataframe API, and Spark performance”

Let’s build open source tensor libraries for data science

[A version of this post appears on the O’Reilly Radar blog.] Tensor methods for machine learning are fast, accurate, and scalable, but we’ll need well-developed libraries. Data scientists frequently find themselves dealing with high-dimensional feature spaces. As an example, text mining usually involves vocabularies comprised of 10,000+ different words. Many analytic problems involve linear algebra,Continue reading “Let’s build open source tensor libraries for data science”

Turning Ph.D.s into industrial data scientists and data engineers

[A version of this post appears on the O’Reilly Radar blog.] Editor’s note: The ASI will offer a two-day intensive course, Practical Machine Learning, at Strata + Hadoop World in London in May. Back when I was considering leaving academia, the popular exit route was financial engineering. Many science and engineering Ph.D.s ended up inContinue reading “Turning Ph.D.s into industrial data scientists and data engineers”

Topic Models: Past, Present, Future

[A version of this post appears on the O’Reilly Radar blog.] The O’Reilly Data Show Podcast: David Blei, co-creator of one of the most popular tools in text mining and machine learning. I don’t remember when I first came across topic models, but I do remember being an early proponent of them in industry. IContinue reading “Topic Models: Past, Present, Future”

Time-turner: Strata San Jose 2015, day 1

[Our friends at Dato created an interesting content-based, Strata session recommender. Check it out here.] There are so many good talks happening at the same time that it’s impossible to not miss out on good sessions. But imagine I had a time-turner necklace and could actually “attend” 2 (maybe 3) sessions happening at the sameContinue reading “Time-turner: Strata San Jose 2015, day 1”

Hardcore Data Science: 2015 California

Ben Recht and I hosted another great edition of Hardcore Data Science yesterday. From the very first talk, the room was full, the audience was attentive, and the energy in the room was high. It remained that way throughout the day. This time around, I spent more time documenting the day on Twitter – enjoy!Continue reading “Hardcore Data Science: 2015 California”