Data is only as valuable as the decisions it enables

[A version of this post appears on the O’Reilly Radar.] The O’Reilly Data Show Podcast: Ion Stoica on building intelligent and secure applications on live data. Subscribe to the O’Reilly Data Show Podcast to explore the opportunities and techniques driving big data, data science, and AI. Find us on Stitcher, TuneIn, iTunes, SoundCloud, RSS. InContinue reading “Data is only as valuable as the decisions it enables”

Structured streaming comes to Apache Spark 2.0

[A version of this post appears on the O’Reilly Radar.] The O’Reilly Data Show Podcast: Michael Armbrust on enabling users to perform streaming analytics, without having to reason about streaming. Subscribe to the O’Reilly Data Show Podcast to explore the opportunities and techniques driving big data, data science, and AI. Find us on Stitcher, TuneIn,Continue reading “Structured streaming comes to Apache Spark 2.0”

Stream processing and messaging systems for the IoT age

[A version of this post appears on the O’Reilly Radar.] The O’Reilly Data Show podcast: M.C. Srivas on streaming, enterprise grade systems, the Internet of Things, and data for social good. Subscribe to the O’Reilly Data Show Podcast to explore the opportunities and techniques driving big data and data science. Find us on Stitcher, TuneIn,Continue reading “Stream processing and messaging systems for the IoT age”

Building systems for massive scale data applications

The O’Reilly Data Show podcast: Tyler Akidau on the evolution of systems for bounded and unbounded data processing. [This piece was co-written by Shannon Cutt. A version of this post appears on the O’Reilly Radar.] Subscribe to the O’Reilly Data Show Podcast to explore the opportunities and techniques driving big data and data science. ManyContinue reading “Building systems for massive scale data applications”

Celebrating the real-time processing revival

[A version of this article appears on the O’Reilly Radar.] Register for Strata + Hadoop World NYC, which will take place September 29 to Oct 1, 2015. A few months ago, I noted the resurgence in interest in large-scale stream-processing tools and real-time applications. Interest remains strong, and if anything, I’ve noticed growth in theContinue reading “Celebrating the real-time processing revival”

Building big data systems in academia and industry

[A version of this post appears on the O’Reilly Radar blog.] The O’Reilly Data Show Podcast: Mikio Braun on stream processing, academic research, and training. Mikio Braun is a machine learning researcher who also enjoys software engineering. We first met when he co-founded a real-time analytics company called streamdrill. Since then, I’ve always had greatContinue reading “Building big data systems in academia and industry”

A real-time processing revival

[A version of this post appears on the O’Reilly Radar blog.] Things are moving fast in the stream processing world. There’s renewed interest in stream processing and analytics. I write this based on some data points (attendance in webcasts and conference sessions; a recent meetup), and many conversations with technologists, startup founders, and investors. Certainly,Continue reading “A real-time processing revival”

Building Apache Kafka from scratch

[A version of this post originally appeared on the O’Reilly Radar blog.] In this episode of the O’Reilly Data Show Podcast, Jay Kreps talks about data integration, event data, and the Internet of Things. At the heart of big data platforms are robust data flows that connect diverse data sources. Over the past few years,Continue reading “Building Apache Kafka from scratch”

5 Fun Facts about HBase that you didn’t know

HBase has made inroads in companies across many industries and countries [A version of this post appears on the O’Reilly Data blog.] With HBaseCon right around the corner, I wanted to take stock of one of the more popular1 components in the Hadoop ecosystem. Over the last few years, many more companies have come toContinue reading “5 Fun Facts about HBase that you didn’t know”

Stream Processing and Mining just got more interesting

[A version of this post appears on the O’Reilly Strata blog.] Largely unknown outside data engineering circles, Apache Kafka is one of the more popular open source, distributed computing projects. Many data engineers I speak with either already use it or are planning to do so. It is a distributed message broker used to store1Continue reading “Stream Processing and Mining just got more interesting”