Experimentation and Optimization Tools for Data Science Teams

In the words of Norvig, Spector et.al. – “Data science is the study of extracting value from data – value in the form of insights and questions.”  In practice, industrial data science teams wear multiple hats and, depending on the company, are often responsible for reports (analytics and BI), models (including machine learning and AI), and experiments (designing and executing tests).

I recently poked around to see what researchers interested in data science have been focusing on, and I found pretty good alignment with what teams in industry are tackling. Data science is fertile ground for automation, and many early examples of automation target aspects of data science projects from modeling (autoML) to coding assistants that already generate decent SQL & pandas code. However, it should be noted that these automation tools are still in their early stages and further advancements are necessary in order to fully realize their potential. As previously noted, there are specific areas that require attention in order to achieve this goal.

[An analysis of academic & conference papers in data science surfaced these key areas. Data via Zeta Alpha.]

My analysis of recent academic and conference papers revealed a shortage of research in areas also neglected by entrepreneurs, specifically tools for experimentation and optimization Historically, experimentation platforms have been bespoke solutions, primarily found within technology companies. However, with the advent of modern data platforms, it has become increasingly feasible to build solutions that democratize and systematize experimentation. As a result, there are now a few startups attempting to fill this important gap in data science tooling.

Operations research (OR), a discipline focused on understanding systems and constructing and refining models to make informed decisions, is a crucial component of data science. OR boasts a wide range of applications, including the allocation of resources, scheduling, inventory management, logistics and supply chain optimization, network management, and others. In order to address optimization challenges, data science teams typically turn to proprietary solvers and simulation tools.

The bottom line is that optimization tools seem due for a refresh, regardless of the framework you use

These existing tools work well, but what if you had access to an optimization tool that scales and can tackle more complex scenarios? One that fits with tools that data science and ML teams use (Python), and takes advantage of modern techniques (RL) to improve the performance of general purpose optimizers.

I’ve long wondered whether the open source framework Ray’s distributed computing capabilities and ability to integrate reinforcement learning models can enable data scientists to tackle increasingly complex optimization problems. I’ve played around with some of the search algorithms in Tune and I found some that are capable of solving some pretty interesting optimization problems (e.g. portfolio optimization in finance). Granted I’ve only explored toy examples and that an enterprise-grade optimization solution would necessitate the implementation of more efficient optimization algorithms and user-friendly interfaces and APIs catering to non-experts. But even with toy examples you get a sense that a flexible distributed computing framework like Ray might be a great substrate for next-generation optimization solutions.  The bottom line is that optimization tools seem due for a refresh, regardless of the framework you use. 

Entrepreneurs may find more opportunities for growth and success by focusing on delivering solutions for experimentation and optimization. The market for such solutions will expand as demand for cutting-edge technology to optimize business processes and decision intelligence capabilities increases. Startups that are able to successfully deliver innovative solutions that meet these needs may see significant growth in the coming years.

If you enjoyed this post please support our work by encouraging your friends and colleagues to subscribe to our newsletter:

%d bloggers like this: