machine learning products Archives

A framework for building and evaluating data products

[A version of this post appears on the O’Reilly Radar.] The O’Reilly Data Show Podcast: Pinterest data scientist Grace Huang on lessons learned in the course of machine learning product launches. Subscribe to the O’Reilly Data Show Podcast to explore the opportunities and techniques driving big data, data science, and AI. Find us on Stitcher,Continue reading “A framework for building and evaluating data products”

The evolution of GraphLab

[A version of this post appears on the O’Reilly Radar blog.] Editor’s note: Carlos Guestrin will be part of the team teaching Large-scale Machine Learning Day at Strata + Hadoop World in San Jose. Visit the Strata + Hadoop World website for more information on the program. I only really started playing around with GraphLabContinue reading “The evolution of GraphLab”

What’s New in Scikit-learn 0.15

Python has emerged as one of the more popular languages for doing data science. The primary reason is the impressive array of tools (the “Pydata” stack) available for addressing many stages of data science pipelines. One of the most popular Pydata tools is scikit-learn, an easy-to-use and highly-efficient machine learning library. I’ve written about whyContinue reading “What’s New in Scikit-learn 0.15”

PredictionIO: an open source machine learning server

PredictionIO a startup that produces an open source machine learning server, has raised a seed round of $2.5M. The company’s engine allows developers to quickly integrate machine learning into products and services. The company’s machine learning server is open source, and is available on Amazon Web Services. As an open source package, the company hopesContinue reading “PredictionIO: an open source machine learning server”

Instrumenting collaboration tools used in data projects

Built-in audit trails can be useful for reproducing and debugging complex data analysis projects [A version of this post appears on the O’Reilly Data blog.] As I noted in a previous post, model building is just one component of the analytic lifecycle. Many analytic projects result in models that get deployed in production environments. Moreover,Continue reading “Instrumenting collaboration tools used in data projects”

Business analysts want access to advanced analytics

[A version of this post appears on the O’Reilly Data blog and Forbes.] I talk with many new companies who build tools for business analysts and other non-technical users. These new tools streamline and simplify important data tasks including interactive analysis (e.g., pivot tables and cohort analysis), interactive visual analysis (as popularized by Tableau andContinue reading “Business analysts want access to advanced analytics”

Six reasons why I recommend scikit-learn

[A version of this post appears on the O’Reilly Data blog.] I use a variety of tools for advanced analytics, most recently I’ve been using Spark (and MLlib), R, scikit-learn, and GraphLab. When I need to get something done quickly, I’ve been turning to scikit-learn for my first pass analysis. For access to high-quality, easy-to-use,Continue reading “Six reasons why I recommend scikit-learn”

Simplifying interactive, realtime, and advanced analytics

[A version of this post appears on the O’Reilly Strata blog and Forbes.] Here are a few observations based on conversations I had during the just concluded Strata NYC conference. Interactive query analysis on Hadoop remains a hot area A recent O’Reilly survey confirmed SQL is an important skill for data scientists. A year afterContinue reading “Simplifying interactive, realtime, and advanced analytics”

Gaining access to the best machine-learning methods

[A version of this post appears on the O’Reilly Strata blog and Forbes.] For companies in the early stages of grappling with big data, the analytic lifecycle (model building, deployment, maintenance) can be daunting. In earlier posts I highlighted some new tools that simplify aspects of the analytic lifecycle, including the early phases of modelContinue reading “Gaining access to the best machine-learning methods”