It’s easier to “discover” features with tools that have broad coverage of the data science workflow [A version of this post appears on the O’Reilly Data blog and Forbes.] Here are a few more observations based on conversations I had during the just concluded Strata Santa Clara conference. Interface languages: Python, R, SQL (and Scala)Continue reading “Interface Languages and Feature Discovery”
Tag Archives: strata
Extending GraphLab to tables
The popular graph analytics framework extends its coverage of the data science workflow [A version of this post appears on the O’Reilly Data blog and Forbes.] GraphLab’s SFrame, an interesting and somewhat under-the-radar tool was unveiled1 at Strata Santa Clara. It is a disk-based, flat table representation that extends GraphLab to tabular data. With theContinue reading “Extending GraphLab to tables”
Bridging the gap between research and implementation
[A version of this post appears on the O’Reilly Data blog.] One of the most popular offerings at Strata Santa Clara was Hardcore Data Science day. Over the next few weeks we hope to profile some of the speakers who presented, and make the video of the talks available as a bundle. In the meantimeContinue reading “Bridging the gap between research and implementation”
Graphs, Time-series, Dataviz, and Crowdsourcing at Strata Santa Clara 2014
There are many fantastic talks at Strata and it can be overwhelming to navigate the schedule. I plan to list talks I’m hoping to catch in a series of “time-turner” posts (check this blog on Wed/Thu at 10 a.m.). But for now let me highlight talks from a few categories: Graphs and Network Analysis: Large-scaleContinue reading “Graphs, Time-series, Dataviz, and Crowdsourcing at Strata Santa Clara 2014”
Big Data solutions through the combination of tools
[A version of this post appears on the O’Reilly Data blog and Forbes.] As a user who tends to mix-and-match many different tools, not having to deal with configuring and assembling a suite of tools is a big win. So I’m really liking the recent trend towards more integrated and packaged solutions. A recent exampleContinue reading “Big Data solutions through the combination of tools”
Business analysts want access to advanced analytics
[A version of this post appears on the O’Reilly Data blog and Forbes.] I talk with many new companies who build tools for business analysts and other non-technical users. These new tools streamline and simplify important data tasks including interactive analysis (e.g., pivot tables and cohort analysis), interactive visual analysis (as popularized by Tableau andContinue reading “Business analysts want access to advanced analytics”
What I use for data visualization
[A version of this post appears on the O’Reilly Data blog.] Depending on the nature of the problem, data size, and deliverable, I still draw upon an array of tools for data visualization. As I survey the Design track at next month’s Strata conference, I see creators and power users of visualization tools that manyContinue reading “What I use for data visualization”
IPython: A unified environment for interactive data analysis
[A version of this post appears on the O’Reilly data blog and Forbes.] As I noted in a recent post on reproducing data projects, notebooks have become popular tools for maintaining, sharing, and replicating long data science workflows. Much of that is due to the popularity of IPython1. In development since 2001, IPython grew outContinue reading “IPython: A unified environment for interactive data analysis”
Big Data systems are making a difference in the fight against cancer
[A version of this post appears on the O’Reilly Data blog and Forbes.] As open source, big data tools enter the early stages of maturation, data engineers and data scientists will have many opportunities to use them to “work on stuff that matters”. Along those lines, computational biology and medicine are areas where skilled dataContinue reading “Big Data systems are making a difference in the fight against cancer”
Simplifying interactive, realtime, and advanced analytics
[A version of this post appears on the O’Reilly Strata blog and Forbes.] Here are a few observations based on conversations I had during the just concluded Strata NYC conference. Interactive query analysis on Hadoop remains a hot area A recent O’Reilly survey confirmed SQL is an important skill for data scientists. A year afterContinue reading “Simplifying interactive, realtime, and advanced analytics”
