Creating large training data sets quickly

[A version of this post appears on the O’Reilly Radar.] The O’Reilly Data Show Podcast: Alex Ratner on why weak supervision is the key to unlocking dark data. Subscribe to the O’Reilly Data Show Podcast to explore the opportunities and techniques driving big data, data science, and AI. Find us on Stitcher, TuneIn, iTunes, SoundCloud,Continue reading “Creating large training data sets quickly”

Becoming a machine learning engineer

[A version of this post appears on the O’Reilly Radar.] The O’Reilly Data Show Podcast: Aurélien Géron on enabling companies to use machine learning in real-world products. Subscribe to the O’Reilly Data Show Podcast to explore the opportunities and techniques driving big data, data science, and AI. Find us on Stitcher, TuneIn, iTunes, SoundCloud, RSS.Continue reading “Becoming a machine learning engineer”

Building a business that combines human experts and data science

The O’Reilly Data Show podcast: Eric Colson on algorithms, human computation, and building data science teams. [A version of this post appears on the O’Reilly Radar.] Subscribe to the O’Reilly Data Show Podcast to explore the opportunities and techniques driving big data and data science. In this episode of the O’Reilly Data Show, I spokeContinue reading “Building a business that combines human experts and data science”

Bridging the divide: Business users and machine learning experts

[A version of this articles appears on the O’Reilly Radar.] Subscribe to the O’Reilly Data Show Podcast to explore the opportunities and techniques driving big data and data science. As tools for advanced analytics become more accessible, data scientist’s roles will evolve. Most media stories emphasize a need for expertise in algorithms and quantitative techniquesContinue reading “Bridging the divide: Business users and machine learning experts”

6 reasons why I like KeystoneML

[A version of this article appears on the O’Reilly Radar.] The O’Reilly Data Show Podcast: Ben Recht on optimization, compressed sensing, and large-scale machine learning pipelines. As we put the finishing touches on what promises to be another outstanding Hardcore Data Science Day at Strata + Hadoop World in New York, I sat down withContinue reading “6 reasons why I like KeystoneML”

The tensor renaissance in data science

[A version of this post appears on the O’Reilly Radar.] The O’Reilly Data Show Podcast: Anima Anandkumar on tensor decomposition techniques for machine learning. After sitting in on UC Irvine Professor Anima Anandkumar’s Strata + Hadoop World 2015 in San Jose presentation, I wrote a post urging the data community to build tensor decomposition libraries forContinue reading “The tensor renaissance in data science”

Let’s build open source tensor libraries for data science

[A version of this post appears on the O’Reilly Radar blog.] Tensor methods for machine learning are fast, accurate, and scalable, but we’ll need well-developed libraries. Data scientists frequently find themselves dealing with high-dimensional feature spaces. As an example, text mining usually involves vocabularies comprised of 10,000+ different words. Many analytic problems involve linear algebra,Continue reading “Let’s build open source tensor libraries for data science”

The evolution of GraphLab

[A version of this post appears on the O’Reilly Radar blog.] Editor’s note: Carlos Guestrin will be part of the team teaching Large-scale Machine Learning Day at Strata + Hadoop World in San Jose. Visit the Strata + Hadoop World website for more information on the program. I only really started playing around with GraphLabContinue reading “The evolution of GraphLab”

Building and deploying large-scale machine learning pipelines

[A version of this post appears on the O’Reilly Radar blog.] There are many algorithms with implementations that scale to large data sets (this list includes matrix factorization, SVM, logistic regression, LASSO, and many others). In fact, machine learning experts are fond of pointing out: if you can pose your problem as a simple optimizationContinue reading “Building and deploying large-scale machine learning pipelines”

Hardcore Data Science day: Strata+Hadoop World 2015

My co-organizer Ben Recht and I are proud to announce the return of Hardcore Data Science day to Strata+Hadoop World in California. We have outstanding speakers – 11 talks in total – and I expect the track to sell out (as it has done in the past). Deep Learning enthusiasts will enjoy sessions on itsContinue reading “Hardcore Data Science day: Strata+Hadoop World 2015”