[A version of this post appears on the O’Reilly Data blog.] I use a variety of tools for advanced analytics, most recently I’ve been using Spark (and MLlib), R, scikit-learn, and GraphLab. When I need to get something done quickly, I’ve been turning to scikit-learn for my first pass analysis. For access to high-quality, easy-to-use,Continue reading “Six reasons why I recommend scikit-learn”
Tag Archives: machine learning
Simplifying interactive, realtime, and advanced analytics
[A version of this post appears on the O’Reilly Strata blog and Forbes.] Here are a few observations based on conversations I had during the just concluded Strata NYC conference. Interactive query analysis on Hadoop remains a hot area A recent O’Reilly survey confirmed SQL is an important skill for data scientists. A year afterContinue reading “Simplifying interactive, realtime, and advanced analytics”
Deep Learning oral traditions
[A version of this post appears on the O’Reilly Strata blog.] This past week I had the good fortune of attending two great talks1 on Deep Learning, given by Googlers Ilya Sutskever and Jeff Dean. Much of the excitement surrounding Deep Learning stems from impressive results in a variety of perception tasks, including speech recognitionContinue reading “Deep Learning oral traditions”
Stream Mining essentials
[A version of this post appears on the O’Reilly Strata blog.] A series of open source, distributed stream processing frameworks have become essential components in many big data technology stacks. Apache Storm remains the most popular, but promising new tools like Spark Streaming and Apache Samza are going to have their share of users. TheseContinue reading “Stream Mining essentials”
Semi-automatic method for grading a million homework assignments
[A version of this post appears on the O’Reilly Strata blog.] One of the hardest things about teaching a large class is grading exams and homework assignments. In my teaching days a “large class” was only in the few hundreds (still a challenge for the TAs and instructor). But in the age of MOOCs, classesContinue reading “Semi-automatic method for grading a million homework assignments”
Gaining access to the best machine-learning methods
[A version of this post appears on the O’Reilly Strata blog and Forbes.] For companies in the early stages of grappling with big data, the analytic lifecycle (model building, deployment, maintenance) can be daunting. In earlier posts I highlighted some new tools that simplify aspects of the analytic lifecycle, including the early phases of modelContinue reading “Gaining access to the best machine-learning methods”
Data analysis tools target non-experts
[A version of this post appears on the O’Reilly Strata blog.] A new set of tools make it easier to do a variety of data analysis tasks. Some require no programming, while other tools make it easier to combine code, visuals, and text in the same workflow. They enable users who aren’t statisticians or dataContinue reading “Data analysis tools target non-experts”
Surfacing anomalies and patterns in Machine Data
[A version of this post appears on the O’Reilly Strata blog.] I’ve been noticing that many interesting big data systems are coming out of IT operations. These are systems that go beyond the standard “capture/measure, display charts, and send alerts”. IT operations has long been a source of many interesting big data1 problems and IContinue reading “Surfacing anomalies and patterns in Machine Data”
Big Data and Advertising: In the trenches
[A version of this post appears on the O’Reilly Strata blog.] The $35B merger of Omnicom and Publicis put the convergence of Big Data and Advertising1 in the front pages of business publications. Adtech2 companies have long been at the forefront of many data technologies, strategies, and techniques. By now it’s well-known that many impressiveContinue reading “Big Data and Advertising: In the trenches”
Improving options for unlocking your graph data
[A version of this post appears on the O’Reilly Strata blog.] The popular open source project GraphLab received a major boost early this week when a new company comprised of its founding developers, raised funding to develop analytic tools for graph data sets. GraphLab Inc. will continue to use the open source GraphLab to “pushContinue reading “Improving options for unlocking your graph data”
