Simplifying interactive, realtime, and advanced analytics

[A version of this post appears on the O’Reilly Strata blog and Forbes.] Here are a few observations based on conversations I had during the just concluded Strata NYC conference. Interactive query analysis on Hadoop remains a hot area A recent O’Reilly survey confirmed SQL is an important skill for data scientists. A year afterContinue reading “Simplifying interactive, realtime, and advanced analytics”

Stream Mining essentials

[A version of this post appears on the O’Reilly Strata blog.] A series of open source, distributed stream processing frameworks have become essential components in many big data technology stacks. Apache Storm remains the most popular, but promising new tools like Spark Streaming and Apache Samza are going to have their share of users. TheseContinue reading “Stream Mining essentials”

Stream Processing and Mining just got more interesting

[A version of this post appears on the O’Reilly Strata blog.] Largely unknown outside data engineering circles, Apache Kafka is one of the more popular open source, distributed computing projects. Many data engineers I speak with either already use it or are planning to do so. It is a distributed message broker used to store1Continue reading “Stream Processing and Mining just got more interesting”