Building big data systems in academia and industry

[A version of this post appears on the O’Reilly Radar blog.] The O’Reilly Data Show Podcast: Mikio Braun on stream processing, academic research, and training. Mikio Braun is a machine learning researcher who also enjoys software engineering. We first met when he co-founded a real-time analytics company called streamdrill. Since then, I’ve always had greatContinue reading “Building big data systems in academia and industry”

Redefining power distribution using big data

[A version of this post appears on the O’Reilly Radar blog.] The O’Reilly Data Show Podcast: Erich Nachbar on testing and deploying open source, distributed computing components. When I first hear of a new open source project that might help me solve a problem, the first thing I do is ask around to see ifContinue reading “Redefining power distribution using big data”

Turning Ph.D.s into industrial data scientists and data engineers

[A version of this post appears on the O’Reilly Radar blog.] Editor’s note: The ASI will offer a two-day intensive course, Practical Machine Learning, at Strata + Hadoop World in London in May. Back when I was considering leaving academia, the popular exit route was financial engineering. Many science and engineering Ph.D.s ended up inContinue reading “Turning Ph.D.s into industrial data scientists and data engineers”

Topic Models: Past, Present, Future

[A version of this post appears on the O’Reilly Radar blog.] The O’Reilly Data Show Podcast: David Blei, co-creator of one of the most popular tools in text mining and machine learning. I don’t remember when I first came across topic models, but I do remember being an early proponent of them in industry. IContinue reading “Topic Models: Past, Present, Future”

Forecasting events, from disease outbreaks to sales to cancer research

[A version of this post appears on the O’Reilly Radar blog.] The O’Reilly Data Show Podcast: Kira Radinsky on predicting events using machine learning, NLP, and semantic analysis. Editor’s note: One of the more popular speakers at Strata + Hadoop World, Kira Radinsky was recently profiled in the new O’Reilly Radar report, Women in Data:Continue reading “Forecasting events, from disease outbreaks to sales to cancer research”

The evolution of GraphLab

[A version of this post appears on the O’Reilly Radar blog.] Editor’s note: Carlos Guestrin will be part of the team teaching Large-scale Machine Learning Day at Strata + Hadoop World in San Jose. Visit the Strata + Hadoop World website for more information on the program. I only really started playing around with GraphLabContinue reading “The evolution of GraphLab”

A brief look at data science’s past and future

[A version of this post appears on the O’Reilly Radar blog.] Back in 2008, when we were working on what became one of the first papers on big data technologies, one of our first visits was to LinkedIn’s new “data” team. Many of the members of that team went on to build interesting tools andContinue reading “A brief look at data science’s past and future”

Apache Spark’s journey from academia to industry

[A version of this post appears on the O’Reilly Radar blog.] Three projects from UC Berkeley’s AMPLab have been keenly adopted by industry: Apache Mesos, Apache Spark, and Tachyon. As an early user, it’s been fun to watch Spark go from an academic lab to the most active open source project in big data. InContinue reading “Apache Spark’s journey from academia to industry”

The science of moving dots: the O’Reilly Data Show Podcast

Rajiv Maheswaran talks about the tools and techniques required to analyze new kinds of sports data [This post originally appeared on the O’Reilly Radar blog.] Editor’s note: you can subscribe to the O’Reilly Data Show Podcast through iTunes, SoundCloud or through our RSS feed. Many data scientists are comfortable working with structured operational data andContinue reading “The science of moving dots: the O’Reilly Data Show Podcast”