[A version of this post appears on the O’Reilly Radar.] The O’Reilly Data Show Podcast: John Akred on building data platforms and enterprise data strategies. Subscribe to the O’Reilly Data Show Podcast to explore the opportunities and techniques driving big data, data science, and AI. Find us on Stitcher, TuneIn, iTunes, SoundCloud, RSS. In thisContinue reading “Using Agile development techniques for data science projects”
Tag Archives: data science
Topic Models: Past, Present, Future
[A version of this post appears on the O’Reilly Radar blog.] The O’Reilly Data Show Podcast: David Blei, co-creator of one of the most popular tools in text mining and machine learning. I don’t remember when I first came across topic models, but I do remember being an early proponent of them in industry. IContinue reading “Topic Models: Past, Present, Future”
The evolution of GraphLab
[A version of this post appears on the O’Reilly Radar blog.] Editor’s note: Carlos Guestrin will be part of the team teaching Large-scale Machine Learning Day at Strata + Hadoop World in San Jose. Visit the Strata + Hadoop World website for more information on the program. I only really started playing around with GraphLabContinue reading “The evolution of GraphLab”
Lessons from next-generation data wrangling tools
[A version of this post appears on the O’Reilly Radar blog.] One of the trends we’re following is the rise of applications that combine big data, algorithms, and efficient user interfaces. As I noted in an earlier post, our interest stems from both consumer apps as well as tools that democratize data analysis. It’s noContinue reading “Lessons from next-generation data wrangling tools”
Streamlining Feature Engineering
Researchers and startups are building tools that enable feature discovery [A version of this post appears on the O’Reilly Data blog.] Why do data scientists spend so much time on data wrangling and data preparation? In many cases it’s because they want access to the best variables with which to build their models. These variablesContinue reading “Streamlining Feature Engineering”
Crowdsourcing Feature discovery
More than algorithms, companies gain access to models that incorporate ideas generated by teams of data scientists [A version of this post appears on the O’Reilly Data blog and Forbes.] Data scientists were among the earliest and most enthusiastic users of crowdsourcing services. Lukas Biewald noted in a recent talk that one of the reasonsContinue reading “Crowdsourcing Feature discovery”
Interface Languages and Feature Discovery
It’s easier to “discover” features with tools that have broad coverage of the data science workflow [A version of this post appears on the O’Reilly Data blog and Forbes.] Here are a few more observations based on conversations I had during the just concluded Strata Santa Clara conference. Interface languages: Python, R, SQL (and Scala)Continue reading “Interface Languages and Feature Discovery”
Bridging the gap between research and implementation
[A version of this post appears on the O’Reilly Data blog.] One of the most popular offerings at Strata Santa Clara was Hardcore Data Science day. Over the next few weeks we hope to profile some of the speakers who presented, and make the video of the talks available as a bundle. In the meantimeContinue reading “Bridging the gap between research and implementation”
IPython: A unified environment for interactive data analysis
[A version of this post appears on the O’Reilly data blog and Forbes.] As I noted in a recent post on reproducing data projects, notebooks have become popular tools for maintaining, sharing, and replicating long data science workflows. Much of that is due to the popularity of IPython1. In development since 2001, IPython grew outContinue reading “IPython: A unified environment for interactive data analysis”
Big Data systems are making a difference in the fight against cancer
[A version of this post appears on the O’Reilly Data blog and Forbes.] As open source, big data tools enter the early stages of maturation, data engineers and data scientists will have many opportunities to use them to “work on stuff that matters”. Along those lines, computational biology and medicine are areas where skilled dataContinue reading “Big Data systems are making a difference in the fight against cancer”