[A version of this post appears on the O’Reilly Strata blog.] Here are a few observations based on conversations I had during the just concluded Strata Santa Clara conference. Spark is attracting attention I’ve written numerous times about components of the Berkeley Data Analytics Stack (Spark, Shark, MLbase). Two Spark-related sessions at Strata were packedContinue reading “Data Science Tools: Fast, easy to use, and scalable”
Tag Archives: strata
MLbase: Scalable Machine-learning made accessible
[Cross-posted on the O’Reilly Strata blog.] In the course of applying machine-learning against large data sets, data scientists face a few pain points. They need to tune and compare several suitable algorithms – a process that may involve having to configure a hodgepodge of tools, requiring different input files, programming languages, and interfaces. Some softwareContinue reading “MLbase: Scalable Machine-learning made accessible”
Seven Reasons I like Spark
[This post originally appeared on the O’Reilly Radar .] A large portion of this week’s Amp Camp at UC Berkeley, is devoted to an introduction to Spark – an open source, in-memory, cluster computing framework. After playing with Spark over the last month, I’ve come to consider it a key part of my big dataContinue reading “Seven Reasons I like Spark”
