How companies are using Spark

[A version of this post appears on the O’Reilly Strata blog.] When an interesting piece of big data technology gets introduced, early1 adopters tend to focus on technical features and capabilities. Applications get built as companies develop confidence that it’s reliable and that it really scales to large data volumes. That seems to be whereContinue reading “How companies are using Spark”

Data analysis tools target non-experts

[A version of this post appears on the O’Reilly Strata blog.] A new set of tools make it easier to do a variety of data analysis tasks. Some require no programming, while other tools make it easier to combine code, visuals, and text in the same workflow. They enable users who aren’t statisticians or dataContinue reading “Data analysis tools target non-experts”

Moving from Batch to Continuous Computing at Yahoo!

[A version of this post appeared on the O’Reilly Strata blog.] My favorite session at the recent Hadoop Summit was a keynote by Bruno Fernandez-Ruiz, Senior Fellow & VP Platforms at Yahoo! He gave a nice overview of their analytic and data processing stack, and shared some interesting factoids about the scale of their bigContinue reading “Moving from Batch to Continuous Computing at Yahoo!”

Python data tools just keep getting better

[A version of this post appeared on the O’Reilly Strata blog.] Here are a few observations inspired by conversations I had during the just concluded PyData conference1. The Python data community is well-organized: Besides conferences (PyData, SciPy, EuroSciPy), there is a new non-profit (NumFOCUS) dedicated to supporting scientific computing and data analytics projects. The listContinue reading “Python data tools just keep getting better”

Seven Reasons I like Spark

[This post originally appeared on the O’Reilly Radar .] A large portion of this week’s Amp Camp at UC Berkeley, is devoted to an introduction to Spark – an open source, in-memory, cluster computing framework. After playing with Spark over the last month, I’ve come to consider it a key part of my big dataContinue reading “Seven Reasons I like Spark”