Data Science Tools: Fast, easy to use, and scalable

[A version of this post appears on the O’Reilly Strata blog.] Here are a few observations based on conversations I had during the just concluded Strata Santa Clara conference. Spark is attracting attention I’ve written numerous times about components of the Berkeley Data Analytics Stack (Spark, Shark, MLbase). Two Spark-related sessions at Strata were packedContinue reading “Data Science Tools: Fast, easy to use, and scalable”

MLbase: Scalable Machine-learning made accessible

[Cross-posted on the O’Reilly Strata blog.] In the course of applying machine-learning against large data sets, data scientists face a few pain points. They need to tune and compare several suitable algorithms – a process that may involve having to configure a hodgepodge of tools, requiring different input files, programming languages, and interfaces. Some softwareContinue reading “MLbase: Scalable Machine-learning made accessible”

Seven Reasons I like Spark

[This post originally appeared on the O’Reilly Radar .] A large portion of this week’s Amp Camp at UC Berkeley, is devoted to an introduction to Spark – an open source, in-memory, cluster computing framework. After playing with Spark over the last month, I’ve come to consider it a key part of my big dataContinue reading “Seven Reasons I like Spark”