Building self-service tools to monitor high-volume time-series data

[A version of this post appears on the O’Reilly Radar.] The O’Reilly Data Show Podcast: Phil Liu on the evolution of metric monitoring tools and cloud computing. One of the main sources of real-time data processing tools is IT operations. In fact, a previous post I wrote on the re-emergence of real-time, was to aContinue reading “Building self-service tools to monitor high-volume time-series data”

Apache Spark: Powering applications on-premise and in the cloud

[A version of this post appears on the O’Reilly Radar.] As organizations shift their focus toward building analytic applications, many are relying on components from the Apache Spark ecosystem. I began pointing this out in advance of the first Spark Summit in 2013 and since then, Spark adoption has exploded. With Spark Summit SF rightContinue reading “Apache Spark: Powering applications on-premise and in the cloud”

Israel conference on Big Data, Analytics and Machine Learning

The first Big Data, Analytics and Machine Learning (Israel Innovation) conference was a resounding success. Kudos to the organizers Danny Bickson, Assaf Araki, and Avner Algom. I was happy to help them invite speakers, publicize the event, and give the opening keynote. The conference was sold out and I heard a lot of ethusiastic feedbackContinue reading “Israel conference on Big Data, Analytics and Machine Learning”

More tools for managing and reproducing complex data projects

A survey of the landscape shows the types of tools remain the same, but interfaces continue to improve. [A version of this post appears on the O’Reilly Radar.] As data projects become complex and as data teams grow in size, individuals and organizations need tools to efficiently manage data projects. A while back, I wroteContinue reading “More tools for managing and reproducing complex data projects”

Coming full circle with Bigtable and HBase

The O’Reilly Data Show Podcast: Michael Stack on HBase past, present, and future. [A version of this post appears on the O’Reilly Radar.] Subscribe to the O’Reilly Data Show to explore the opportunities and techniques driving big data and data science. At least once a year, I sit down with Michael Stack, engineer at Cloudera,Continue reading “Coming full circle with Bigtable and HBase”

Building big data systems in academia and industry

[A version of this post appears on the O’Reilly Radar blog.] The O’Reilly Data Show Podcast: Mikio Braun on stream processing, academic research, and training. Mikio Braun is a machine learning researcher who also enjoys software engineering. We first met when he co-founded a real-time analytics company called streamdrill. Since then, I’ve always had greatContinue reading “Building big data systems in academia and industry”

A real-time processing revival

[A version of this post appears on the O’Reilly Radar blog.] Things are moving fast in the stream processing world. There’s renewed interest in stream processing and analytics. I write this based on some data points (attendance in webcasts and conference sessions; a recent meetup), and many conversations with technologists, startup founders, and investors. Certainly,Continue reading “A real-time processing revival”

Redefining power distribution using big data

[A version of this post appears on the O’Reilly Radar blog.] The O’Reilly Data Show Podcast: Erich Nachbar on testing and deploying open source, distributed computing components. When I first hear of a new open source project that might help me solve a problem, the first thing I do is ask around to see ifContinue reading “Redefining power distribution using big data”