Apache Spark’s journey from academia to industry

[A version of this post appears on the O’Reilly Radar blog.] Three projects from UC Berkeley’s AMPLab have been keenly adopted by industry: Apache Mesos, Apache Spark, and Tachyon. As an early user, it’s been fun to watch Spark go from an academic lab to the most active open source project in big data. InContinue reading “Apache Spark’s journey from academia to industry”

Building Apache Kafka from scratch

[A version of this post originally appeared on the O’Reilly Radar blog.] In this episode of the O’Reilly Data Show Podcast, Jay Kreps talks about data integration, event data, and the Internet of Things. At the heart of big data platforms are robust data flows that connect diverse data sources. Over the past few years,Continue reading “Building Apache Kafka from scratch”

The science of moving dots: the O’Reilly Data Show Podcast

Rajiv Maheswaran talks about the tools and techniques required to analyze new kinds of sports data [This post originally appeared on the O’Reilly Radar blog.] Editor’s note: you can subscribe to the O’Reilly Data Show Podcast through iTunes, SoundCloud or through our RSS feed. Many data scientists are comfortable working with structured operational data andContinue reading “The science of moving dots: the O’Reilly Data Show Podcast”