A compelling family of DSLs for Data Science

[A version of this post appears on the O’Reilly Data blog.] An important reason why pydata tools and Spark appeal to data scientists is that they both cover many data science tasks and workloads (Spark users can move seamlessly between batch and streaming). Being able to use the same programming style and syntax for workflowsContinue reading “A compelling family of DSLs for Data Science”

Reproducing Data Projects

[A version of this post appears on the O’Reilly Strata blog.] As I talk to people and companies building the next generation of tools for data scientists, collaboration and reproducibility keep popping up. Collaboration is baked into many of the newer tools I’ve seen (including ones that have yet to be released). Reproducibility is aContinue reading “Reproducing Data Projects”

Data Wrangling gets a fresh look

[A version of this post appears on the O’Reilly Strata blog.] Data analysts have long lamented the amount of time they spend on data wrangling. Rightfully so, as some estimates suggest they spend a majority of their time on it. The problem is compounded by the fact that these days, data scientists are encouraged toContinue reading “Data Wrangling gets a fresh look”

Data scientists tackle the analytic lifecycle

[A version of this post appears on the O’Reilly Strata blog.] What happens after data scientists build analytic models? Model deployment, monitoring, and maintenance are topics that haven’t received as much attention in the past, but I’ve been hearing more about these subjects from data scientists and software developers. I remember the days when itContinue reading “Data scientists tackle the analytic lifecycle”

Simpler workflow tools enable the rapid deployment of models

[A version os this post appears on the O’Reilly Strata blog.] Data science often depends on data pipelines, that involve acquiring, transforming, and loading data. (If you’re fortunate most of the data you need is already in usable form.) Data needs to be assembled and wrangled, before it can be visualized and analyzed. Many companiesContinue reading “Simpler workflow tools enable the rapid deployment of models”