Building self-service tools to monitor high-volume time-series data

[A version of this post appears on the O’Reilly Radar.] The O’Reilly Data Show Podcast: Phil Liu on the evolution of metric monitoring tools and cloud computing. One of the main sources of real-time data processing tools is IT operations. In fact, a previous post I wrote on the re-emergence of real-time, was to aContinue reading “Building self-service tools to monitor high-volume time-series data”

Apache Spark: Powering applications on-premise and in the cloud

[A version of this post appears on the O’Reilly Radar.] As organizations shift their focus toward building analytic applications, many are relying on components from the Apache Spark ecosystem. I began pointing this out in advance of the first Spark Summit in 2013 and since then, Spark adoption has exploded. With Spark Summit SF rightContinue reading “Apache Spark: Powering applications on-premise and in the cloud”

Data science makes an impact on Wall Street

[A version of this article appears on the O’Reilly Radar.] Having started my career in industry, working on problems in finance, I’ve always appreciated how challenging it is to build consistently profitable systems in this extremely competitive domain. When I served as quant at a hedge fund in the late 1990s and early 2000s, IContinue reading “Data science makes an impact on Wall Street”

Israel conference on Big Data, Analytics and Machine Learning

The first Big Data, Analytics and Machine Learning (Israel Innovation) conference was a resounding success. Kudos to the organizers Danny Bickson, Assaf Araki, and Avner Algom. I was happy to help them invite speakers, publicize the event, and give the opening keynote. The conference was sold out and I heard a lot of ethusiastic feedbackContinue reading “Israel conference on Big Data, Analytics and Machine Learning”

The tensor renaissance in data science

[A version of this post appears on the O’Reilly Radar.] The O’Reilly Data Show Podcast: Anima Anandkumar on tensor decomposition techniques for machine learning. After sitting in on UC Irvine Professor Anima Anandkumar’s Strata + Hadoop World 2015 in San Jose presentation, I wrote a post urging the data community to build tensor decomposition libraries forContinue reading “The tensor renaissance in data science”

More tools for managing and reproducing complex data projects

A survey of the landscape shows the types of tools remain the same, but interfaces continue to improve. [A version of this post appears on the O’Reilly Radar.] As data projects become complex and as data teams grow in size, individuals and organizations need tools to efficiently manage data projects. A while back, I wroteContinue reading “More tools for managing and reproducing complex data projects”

Coming full circle with Bigtable and HBase

The O’Reilly Data Show Podcast: Michael Stack on HBase past, present, and future. [A version of this post appears on the O’Reilly Radar.] Subscribe to the O’Reilly Data Show to explore the opportunities and techniques driving big data and data science. At least once a year, I sit down with Michael Stack, engineer at Cloudera,Continue reading “Coming full circle with Bigtable and HBase”

Building big data systems in academia and industry

[A version of this post appears on the O’Reilly Radar blog.] The O’Reilly Data Show Podcast: Mikio Braun on stream processing, academic research, and training. Mikio Braun is a machine learning researcher who also enjoys software engineering. We first met when he co-founded a real-time analytics company called streamdrill. Since then, I’ve always had greatContinue reading “Building big data systems in academia and industry”