[A version of this post appears on the O’Reilly Radar.] The O’Reilly Data Show podcast: Joe Hellerstein on data wrangling, distributed systems, and metadata services. In this episode of the O’Reilly Data Show, I spoke with one of the most popular speakers at Strata+Hadoop World: Joe Hellerstein, Professor of Computer Science at UC Berkeley andContinue reading “Metadata services can lead to performance and organizational improvements”
Tag Archives: reproducibility
We need open and vendor-neutral metadata services
[A version of this article appears on the O’Reilly Radar.] Comprehensive metadata collection and analysis can pave the way for many interesting applications. As I spoke with friends leading up to Strata + Hadoop World NYC 2015, one topic continued to come up: metadata. It’s a topic that data engineers and data management researchers haveContinue reading “We need open and vendor-neutral metadata services”
Bits from the Data Store
Semi-regular field notes from the world of data (gathered from Scifoo 2014): Filtergraph and the power of visual exploration: A web-based tool for exploring high-dimensional data sets, Filtergraph came out of the lab of Astrophysicist Keivan Stassun. It has helped researchers make several interesting discoveries including a paper (that appeared in Nature) on a techniqueContinue reading “Bits from the Data Store”
Bits from the Data Store
Semi-regular field notes from the world of data: I’m always on the lookout for interesting tools and ideas for reproducing and collaborating on long data workflows. Reproducibility and collaboration are topics that we’re following closely at Strata (both topics remain on the radar of many data scientists and data engineers I speak with). At theContinue reading “Bits from the Data Store”