From search to distributed computing to large-scale information extraction

Subscribe to the O’Reilly Data Show Podcast to explore the opportunities and techniques driving big data and data science. February 2016 marks the 10th anniversary of Hadoop — at a point in time when many IT organizations actively use Hadoop, and/or one of the open source, big data projects that originated after, and in someContinue reading “From search to distributed computing to large-scale information extraction”

Improving options for unlocking your graph data

[A version of this post appears on the O’Reilly Strata blog.] The popular open source project GraphLab received a major boost early this week when a new company comprised of its founding developers, raised funding to develop analytic tools for graph data sets. GraphLab Inc. will continue to use the open source GraphLab to “pushContinue reading “Improving options for unlocking your graph data”

The re-emergence of Time-series

[A version of this post appeared on the O’Reilly Strata and Radar blogs.] My first job after leaving academia was as a quant1 for a hedge fund, where I performed (what are now referred to as) data science tasks on financial time-series. I primarily used techniques from probability & statistics, econometrics, and optimization, with occasionalContinue reading “The re-emergence of Time-series”

Mining Time-series with Trillions of Points: Dynamic Time Warping at scale

Take a similarity measure that’s already well-known to researchers who work with time-series, and devise an algorithm to compute it efficiently at scale. Suddenly intractable problems become tractable, and Big Data mining applications that use the metric are within reach. The classification, clustering, and searching through time series have important applications in many domains. InContinue reading “Mining Time-series with Trillions of Points: Dynamic Time Warping at scale”