Why data preparation frameworks rely on human-in-the-loop systems

[A version of this article appears on the O’Reilly Radar.] As I’ve written in previous posts, data preparation and data enrichment are exciting areas for entrepreneurs, investors, and researchers. Startups like Trifacta, Tamr, Paxata, Alteryx, and CrowdFlower continue to innovate and attract enterprise customers. I’ve also noticed that companies — that don’t specialize in theseContinue reading “Why data preparation frameworks rely on human-in-the-loop systems”

Streamlining Feature Engineering

Researchers and startups are building tools that enable feature discovery [A version of this post appears on the O’Reilly Data blog.] Why do data scientists spend so much time on data wrangling and data preparation? In many cases it’s because they want access to the best variables with which to build their models. These variablesContinue reading “Streamlining Feature Engineering”

Data Wrangling gets a fresh look

[A version of this post appears on the O’Reilly Strata blog.] Data analysts have long lamented the amount of time they spend on data wrangling. Rightfully so, as some estimates suggest they spend a majority of their time on it. The problem is compounded by the fact that these days, data scientists are encouraged toContinue reading “Data Wrangling gets a fresh look”