Here's what we need to do to fix AutoML

Seven suggestions to enhance the effectiveness of AutoML solutions.

By Assaf Araki and Ben Lorica.

A recent McKinsey survey report reported that more than 56% of respondents have implemented at least one AI function, up from 50 percent in 2020. As AI adoption increases, the survey examined the factors and practices that differentiate the best AI organizations. The authors found that those who adopted advanced practices pertaining to MLOps – tools and processes for building, deploying and managing machine learning – were more likely to have high performing AI programs and initiatives.

Other studies note that shortage of talent across relevant job functions is another major challenge facing many organizations wanting to improve their AI programs. Fortunately, there has been a substantial investment in tools that automate and democratize AI and machine learning. “AutoML” is a combination of automation and ML, and AutoML describes tools and techniques for automating the time-consuming and iterative process of building machine learning pipelines.

With AutoML, domain experts can create machine learning applications without much statistical or machine learning knowledge, thereby accelerating product development while simultaneously reducing the need for data scientists and machine learning engineers. But while existing AutoML tools have focused primarily on model generation and model building, the McKinsey survey notes that successful AI programs cover the entire machine learning lifecycle.

**Figure 1**: AutoML researchers have traditionally focused on tools for identifying the best performing model.

While AutoML remains a relatively new and very small market, it has enormous potential. From $270 million in 2019, a recent report estimates that the AutoML market will generate $14.5 billion in revenue by 2030, advancing at a compound annual growth rate of 43.7% over the forecast period (2020-2030).

This article suggests areas where AutoML tools should focus in order to maximize their impact. We envision AutoML tools that automate more than just model building, but also incorporate model testing and validation, feature engineering, and aspects of data preparation (including data quality checks, data alignment, and even missing data imputation).

AutoML Landscape

**Figure 2**: Representative sample of AutoML solutions by data type and user. Some solutions (Landing, Explorium, Noogata, Inference.io) focus on specific domains (e.g, Sales, Manufacturing) and/or specific ML problems (e.g., LTV, churn prediction).

Depending on the problem, AutoML can be challenging due to the large search space of candidate models, model architectures, and parameters. A reduced search space simplifies the problem and increases the chance of finding a better solution. Here are some attributes to keep in mind:

Specificity – Choosing between general purpose AutoML solutions, or AutoML solutions that target specific domains or specific problems.
Data type – AutoML solutions optimized for structured data, unstructured data (text, visual data, audio), or both.
Programming expertise of target user – Solutions accessible to analysts with minimal coding expertise, or data scientist, or developers and engineers.

**Figure 3**: Current AutoML solutions include no-code tools, and those that target specific data types, domains (“manufacturing”), and problems (“forecasting”).

AutoML Opportunities

AutoML solutions are still fairly new and usage is small but growing. We’ve compiled ideas for how AutoML tools can attract more users while simultaneously increasing their impact. Addressing the limitations listed below will go a long way towards enhancing the effectiveness of AutoML solutions.

1. Make AutoML even more accessible: Although existing tools often come with a declarative interface or a slick UI, they can still be daunting and require data science and machine learning expertise to interpret outputs. There is an enormous pool of potential users (developers and analysts) with minimal backgrounds in machine learning, as shown in Figure 5.

2. Add context: Ideally, AutoML tools are able to handle optimization problems with multiple objectives (e.g., optimize model accuracy given some additional constraints like cost, model size, or latency). But context also matters: successful AI and software projects start with defining and determining business objectives and success criteria. In fact the McKinsey survey notes that high-performing AI programs “use design thinking when developing AI tools”. At its core, design thinking encourages teams to focus on the people and applications they’re creating for, which results in better products and services. As we note in Figure 3 some solution providers are already beginning to target specific verticals & domains, as well as specific problems (e.g. fraud, forecasting).

3. Current tools are limited in scope: At a minimum, AutoML tools should aspire to automate more than just model generation (see Figures 1 and 4). More generally, many real-world AI applications rely on a sequence of models that feed off one another (e.g., consider a regression model, that feeds into a classifier, that then feeds into a forecasting model). We envision a future when AutoML tools are able to automate and orchestrate several models into one end-to-end AI application.

**Figure 4**: Expanding the footprint of AutoML tools to span aspects of data preparation (including data quality checks and missing data imputation), feature engineering, model building, and model validation and testing.

4. AutoML tools need to include versioning and experiment tracking: Machine learning is a highly experimental and creative process. Continual experimentation produces a large number of versions, but few will reach production. Users of AutoML tools (data scientists; domain experts and non-programmers) will want to keep track of their work with tools for experiment tracking, dataset versioning, and model management. AutoML tools will need to integrate with solutions like MLflow, DeepCAVE, or other similar systems.

Successful AI projects start with defining and determining business objectives

5. Develop tools to improve transparency: Machine learning models can be opaque, so much so that there are researchers and developers focused on tools to improve model explainability and transparency. AutoML requires similar investments in transparency particularly as it begins to automate things that are more opaque than mere models (machine learning pipelines).

6. AutoML tools should offer control to users who want more of it: Full automation tools are great as long as users are happy with the results, but more knowledgeable users (data scientists or ML engineers) will periodically want to dive in and tweak solutions produced by AutoML tools. To make customizing models and pipelines easy, AutoML solutions must provide hints plus clear guidance and documentation.

7. Resulting models should be usable: The goal is usually to produce ML pipelines that can be deployed and maintained. AutoML candidate models often grow very complex, and converting them into something practical enough to deploy and debug can be a challenge. This is particularly important since ML models operate in dynamically changing environments and need constant monitoring and retraining.

Call for Startups

The automation of machine learning workflows has barely begun and the market opportunity for further automation and democratization of machine learning is immense. If you are a developer or founder and would like to exchange notes, shoot us an email at assaf.araki@intel.com or at contact@gradientflow.com.

**Figure 5**: Data based on *current job titles used on Linkedin profiles* (data from Diffbot). Beyond data scientists and machine learning experts, next-generation AutoML tools have the potential to reach a wide range of users. **Analysts** outnumber **data scientists** by at least ten times, and they outnumber **ML/DL engineers** by sixty times. **Developers and Software Engineers** outnumber **data scientists** by at least sixteen times, and they outnumber **ML/DL engineers** by over a hundred times.

Assaf Araki is an investment manager at Intel Capital. His contributions to this post are his personal opinion and do not represent the opinion of the Intel Corporation. Intel Capital is an investor in Landing, Matroid, Anodot, Einblick, and DataRobot. #IamIntel

Ben Lorica is principal at Gradient Flow. He is an advisor to Databricks, Matroid, Anodot, and other startups.

If you enjoyed this post please support our work by encouraging your friends and colleagues to subscribe to our newsletter:

Related content: Other posts by Assaf Araki and Ben Lorica.