Forecasting events, from disease outbreaks to sales to cancer research

[A version of this post appears on the O’Reilly Radar blog.] The O’Reilly Data Show Podcast: Kira Radinsky on predicting events using machine learning, NLP, and semantic analysis. Editor’s note: One of the more popular speakers at Strata + Hadoop World, Kira Radinsky was recently profiled in the new O’Reilly Radar report, Women in Data:Continue reading “Forecasting events, from disease outbreaks to sales to cancer research”

Network structure and dynamics in online social systems

Understanding information cascades, viral content, and significant relationships. [A version of this post appears on the O’Reilly Radar blog.] I rarely work with social network data, but I’m familiar with the standard problems confronting data scientists who work in this area. These include questions pertaining to network structure, viral content, and the dynamics of informationContinue reading “Network structure and dynamics in online social systems”

The evolution of GraphLab

[A version of this post appears on the O’Reilly Radar blog.] Editor’s note: Carlos Guestrin will be part of the team teaching Large-scale Machine Learning Day at Strata + Hadoop World in San Jose. Visit the Strata + Hadoop World website for more information on the program. I only really started playing around with GraphLabContinue reading “The evolution of GraphLab”

Building and deploying large-scale machine learning pipelines

[A version of this post appears on the O’Reilly Radar blog.] There are many algorithms with implementations that scale to large data sets (this list includes matrix factorization, SVM, logistic regression, LASSO, and many others). In fact, machine learning experts are fond of pointing out: if you can pose your problem as a simple optimizationContinue reading “Building and deploying large-scale machine learning pipelines”

A brief look at data science’s past and future

[A version of this post appears on the O’Reilly Radar blog.] Back in 2008, when we were working on what became one of the first papers on big data technologies, one of our first visits was to LinkedIn’s new “data” team. Many of the members of that team went on to build interesting tools andContinue reading “A brief look at data science’s past and future”

“Humans-in-the-loop” machine learning systems

Next week I’ll be hosting a webcast featuring Adam Marcus, one of the foremost experts on the topic of “humans-in-the-loop” machine learning systems. It’s a subject many data scientists have heard about, but very few have had the experience of building productions systems that leverage humans: Crowdsourcing marketplaces like Elance-oDesk or CrowdFlower give us accessContinue reading ““Humans-in-the-loop” machine learning systems”

Lessons from next-generation data wrangling tools

[A version of this post appears on the O’Reilly Radar blog.] One of the trends we’re following is the rise of applications that combine big data, algorithms, and efficient user interfaces. As I noted in an earlier post, our interest stems from both consumer apps as well as tools that democratize data analysis. It’s noContinue reading “Lessons from next-generation data wrangling tools”

Spark 1.2 and Beyond

Next week I’ll be hosting a webcast with Spark’s release manager – and Databricks co-founder – Patrick Wendell. (Full disclosure: I’m an advisor to Databricks.) In this webcast, Patrick Wendell from Databricks will be speaking about Spark’s new 1.2 release. Spark 1.2 brings performance and usability improvements in Spark’s core engine, a major new APIContinue reading “Spark 1.2 and Beyond”

Clustering bitcoin accounts using heuristics

[A version of this post appears on the O’Reilly Radar blog.] Editor’s note: we’ll explore present and future applications of cryptocurrency and blockchain technologies at our upcoming Radar Summit: Bitcoin & the Blockchain on Jan. 27, 2015, in San Francisco. A few data scientists are starting to play around with cryptocurrency data, and as bitcoinContinue reading “Clustering bitcoin accounts using heuristics”

Hardcore Data Science day: Strata+Hadoop World 2015

My co-organizer Ben Recht and I are proud to announce the return of Hardcore Data Science day to Strata+Hadoop World in California. We have outstanding speakers – 11 talks in total – and I expect the track to sell out (as it has done in the past). Deep Learning enthusiasts will enjoy sessions on itsContinue reading “Hardcore Data Science day: Strata+Hadoop World 2015”