Bits from the Data Store

Semi-regular field notes from the world of data: Michael Jordan (“ask me anything”): The distinguished machine learning and Bayesian researcher from UC Berkeley’s AMPLab has an interesting perspective on machine learning and statistics. … while I do think of neural networks as one important tool in the toolbox, I find myself surprisingly rarely going toContinue reading “Bits from the Data Store”

Bits from the Data Store

Semi-regular field notes from the world of data: Apache Spark development community: Josh Rosen of Databricks recently built a tool for browsing pull requests. I like that it lets you scan each of the major components (Spark SQL, Streaming, MLlib, etc.). Now that Spark has become one of the most active open source projects inContinue reading “Bits from the Data Store”

Bits from the Data Store

Semi-regular field notes from the world of data: Alibaba ♥ Spark: Next time someone asks you if Apache Spark scales, point them to this recent post by Chinese e-commerce juggernaut Alibaba. What particularly caught my eye is the company’s heavy usage of GraphX, Spark’s library for graph analytics. [Full disclosure: I’m an advisor to Databricks,Continue reading “Bits from the Data Store”

Bits from the Data Store

Semi-regular field notes from the world of data (gathered from Scifoo 2014): Filtergraph and the power of visual exploration: A web-based tool for exploring high-dimensional data sets, Filtergraph came out of the lab of Astrophysicist Keivan Stassun. It has helped researchers make several interesting discoveries including a paper (that appeared in Nature) on a techniqueContinue reading “Bits from the Data Store”

Bits from the Data Store

Semi-regular field notes from the world of data: Tucked away in the community room at the recent GraphLab conference, I took a few people to a demo by Graphistry, a startup that lets users visually interact and analyze massive amounts of data. In particular their technology can handle and draw many more points than d3.jsContinue reading “Bits from the Data Store”

Bits from the Data Store

Semi-regular field notes from the world of data: I’m always on the lookout for interesting tools and ideas for reproducing and collaborating on long data workflows. Reproducibility and collaboration are topics that we’re following closely at Strata (both topics remain on the radar of many data scientists and data engineers I speak with). At theContinue reading “Bits from the Data Store”