[A version of this post appears on the O’Reilly Radar.] The O’Reilly Data Show Podcast: Phil Liu on the evolution of metric monitoring tools and cloud computing. One of the main sources of real-time data processing tools is IT operations. In fact, a previous post I wrote on the re-emergence of real-time, was to aContinue reading “Building self-service tools to monitor high-volume time-series data”
Category Archives: Data Engineer
Apache Spark: Powering applications on-premise and in the cloud
[A version of this post appears on the O’Reilly Radar.] As organizations shift their focus toward building analytic applications, many are relying on components from the Apache Spark ecosystem. I began pointing this out in advance of the first Spark Summit in 2013 and since then, Spark adoption has exploded. With Spark Summit SF rightContinue reading “Apache Spark: Powering applications on-premise and in the cloud”
Israel conference on Big Data, Analytics and Machine Learning
The first Big Data, Analytics and Machine Learning (Israel Innovation) conference was a resounding success. Kudos to the organizers Danny Bickson, Assaf Araki, and Avner Algom. I was happy to help them invite speakers, publicize the event, and give the opening keynote. The conference was sold out and I heard a lot of ethusiastic feedbackContinue reading “Israel conference on Big Data, Analytics and Machine Learning”
More tools for managing and reproducing complex data projects
A survey of the landscape shows the types of tools remain the same, but interfaces continue to improve. [A version of this post appears on the O’Reilly Radar.] As data projects become complex and as data teams grow in size, individuals and organizations need tools to efficiently manage data projects. A while back, I wroteContinue reading “More tools for managing and reproducing complex data projects”
The Spark Spot at Spark Summit East 2015 – NYC
I had a series of conversations with a few lead developers and users of Apache Spark, at the recent Spark Summit NYC conference. You can view them on Youtube or using the player below:
Coming full circle with Bigtable and HBase
The O’Reilly Data Show Podcast: Michael Stack on HBase past, present, and future. [A version of this post appears on the O’Reilly Radar.] Subscribe to the O’Reilly Data Show to explore the opportunities and techniques driving big data and data science. At least once a year, I sit down with Michael Stack, engineer at Cloudera,Continue reading “Coming full circle with Bigtable and HBase”
Spark Summit East panel
Here’s a video of our well-received panel from a few weeks ago, featuring Abhishek Mehta , Tresata – Founder & CEO George Mathew , Alteryx – President & COO Patrick Wendell , Databricks – Co-founder Martin Van Ryswyk , DataStax – VP of Engineering
Building big data systems in academia and industry
[A version of this post appears on the O’Reilly Radar blog.] The O’Reilly Data Show Podcast: Mikio Braun on stream processing, academic research, and training. Mikio Braun is a machine learning researcher who also enjoys software engineering. We first met when he co-founded a real-time analytics company called streamdrill. Since then, I’ve always had greatContinue reading “Building big data systems in academia and industry”
A real-time processing revival
[A version of this post appears on the O’Reilly Radar blog.] Things are moving fast in the stream processing world. There’s renewed interest in stream processing and analytics. I write this based on some data points (attendance in webcasts and conference sessions; a recent meetup), and many conversations with technologists, startup founders, and investors. Certainly,Continue reading “A real-time processing revival”
Redefining power distribution using big data
[A version of this post appears on the O’Reilly Radar blog.] The O’Reilly Data Show Podcast: Erich Nachbar on testing and deploying open source, distributed computing components. When I first hear of a new open source project that might help me solve a problem, the first thing I do is ask around to see ifContinue reading “Redefining power distribution using big data”
