[A version of this post appears on the O’Reilly Strata blog.] Here are a few observations based on conversations I had during the just concluded Strata Santa Clara conference. Spark is attracting attention I’ve written numerous times about components of the Berkeley Data Analytics Stack (Spark, Shark, MLbase). Two Spark-related sessions at Strata were packedContinue reading “Data Science Tools: Fast, easy to use, and scalable”
Author Archives: Ben Lorica
MLbase: Scalable Machine-learning made accessible
[Cross-posted on the O’Reilly Strata blog.] In the course of applying machine-learning against large data sets, data scientists face a few pain points. They need to tune and compare several suitable algorithms – a process that may involve having to configure a hodgepodge of tools, requiring different input files, programming languages, and interfaces. Some softwareContinue reading “MLbase: Scalable Machine-learning made accessible”
2012 Revenue of some Big Data companies
The chart below is from Wikibon’s estimates1 of the 2012 revenue of some Big Data companies. Using d3 I drew a chart that shows 2012 revenue in millions, as well as the share of revenue derived from services, for a few select/startup companies: The Big 3 Hadoop Vendors (Cloudera/MapR/Hortonworks): Combined revenue was $102M, with $61.6MContinue reading “2012 Revenue of some Big Data companies”
Mining Time-series with Trillions of Points: Dynamic Time Warping at scale
Take a similarity measure that’s already well-known to researchers who work with time-series, and devise an algorithm to compute it efficiently at scale. Suddenly intractable problems become tractable, and Big Data mining applications that use the metric are within reach. The classification, clustering, and searching through time series have important applications in many domains. InContinue reading “Mining Time-series with Trillions of Points: Dynamic Time Warping at scale”
Seven Reasons I like Spark
[This post originally appeared on the O’Reilly Radar .] A large portion of this week’s Amp Camp at UC Berkeley, is devoted to an introduction to Spark – an open source, in-memory, cluster computing framework. After playing with Spark over the last month, I’ve come to consider it a key part of my big dataContinue reading “Seven Reasons I like Spark”
2022 Workflow Orchestration Survey Report
Thank You, below is a link to your copy of the: 2022 State of Workflow Orchestration Survey Report Click here to Subscribe to our Newsletter, Podcast, and YouTube Channel.
2022 Identity Management Survey Report
Thank You, below is a link to your copy of the: 2022 Identity Management Survey Report Click here to Subscribe to our Newsletter, Podcast, and YouTube Channel.
2022 Workflow Orchestration Survey Report
Thank You, check your email for your copy of the: 2022 Workflow OrchestrationSurvey Report. Click here to Subscribe to our Newsletter, Podcast, and YouTube Channel.
2022 AI in Healthcare Survey Report
Thank You, below is a link to your copy of the: 2022 AI in Healthcare Survey Report Click here to Subscribe to our Newsletter, Podcast, and YouTube Channel.
2023 Trends Report: Data, Machine Learning, and AI
Thank You, below is a link to your copy of the: 2023 Trends Report: Data, Machine Learning, and AI Click here to Subscribe to our Newsletter, Podcast, and YouTube Channel.
