With interest in MLOps surging, companies are bound to reassess their tools, as well as the composition of their data and ML teams.
About five years ago we published a post that highlighted the emergence of a role focused on making data science work in production. We were prompted by job postings (in the SF Bay Area) that used the title “machine learning engineer” to describe individuals skilled at making data products and machine learning models work in production.
To understand the current state of the “machine learning engineer” role, I turned to Diffbot, the largest knowledge graph of the world wide web (CEO Mike Tung was a recent guest on the Data Exchange podcast). Diffbot regularly crawls the entire web and their knowledge graph is used by leading companies across many domains and applications. Having spent time using and understanding their knowledge graph, my conclusion is that Diffbot’s data is a good source for estimates needed to understand the state of the “machine learning engineer” role.
To start I wanted to see if the title “machine learning engineer” (MLE) has taken hold among companies and individuals:
Yes it has. There are over 10,000 people who associate the MLE title with their current role, and over 20,000 have used the MLE title sometime in their career (in their current or previous roles).
Machine learning engineer is a relatively new position, as such MLEs have had relatively shorter stints with their current employers:
- About 30% of MLEs have been with their current employer for only 0-2 years [the comparable share for other positions are: data scientists (21%), data engineers (23%), and cloud engineers (20%)].
- About 16% of MLEs have been with their current employer for at least 4 years [the comparable share for other positions are: data scientists (26%), data engineers (29%), and cloud engineers (30%)].
Who employs the 10,000+ individuals who use MLE to describe their current role? To answer this question I used the industry1 taxonomy that comes with the Diffbot Knowledge Graph. Many MLE’s currently work in software or technology companies.
In comparison to job roles that have been around longer – data scientists or data engineers – the technology sector still dominates the list of key MLE employers.
Where are MLEs most likely to be currently located? Compared to more mature job titles – data scientist or data engineer – a higher percentage of MLEs are in California:
The MLE role is still not as common outside of key technology regions. In fact, of the top 10 cities where MLEs are currently located, seven are in the SF Bay Area, and these seven cities account for about one in six (18%) of all current MLEs:
Characterizing Supply and Demand
What skills do current MLEs use to describe themselves? In the chart below we list the top 20 skills listed by individuals currently associated with a specific job title. MLEs do actually seem to emphasize skills associated2 with productionizing machine learning models (e.g., software development; linux and the linux kernel; cms; cloud; java, c, and c++).
Let’s take a look at the demand side of the job market. The graphic below lists key phrases found in job postings for machine learning engineers. Aside from terms associated with machine learning, employers do use phrases that hint at deeper technical skills needed to productionize ML, including items that motivated our original post five years ago.
- Notable key phrases:
“machine learning pipelines”; “machine learning applications”; “machine learning systems”;
“create scalable solutions”; “software engineers”; “applied machine learning”;
“massive data sets”; “software development”; “data engineering”; “production”
For now, the machine learning engineer job title/role is still more likely to be found in the technology sector. I do think that interest in individuals with MLE or similar profiles will increase and spread into other industry sectors. Judging by the number of startups and tools in the MLOps space, there is growing interest in tools to transition ML and AI initiatives into actual working applications and products. With demand for MLOps talent surging, companies are reassessing their tools, infrastructure, as well as the composition and structure of their data and ML teams.
Acknowledgements: Thanks to Diffbot CEO Mike Tung for his assistance with the Diffbot Query Language.
- Applications of Knowledge Graphs: Mike Tung on the Data Exchange podcast.
- “What is Graph Intelligence?”
- “One Simple Graphic: Interest in MLOps is surging”
- “Model Monitoring Enables Robust Machine Learning Applications”
- “Top Places to Work for Data Scientists” and “Top Places to Work for Data Engineers”
 I played around with the data and found the Diffbot company taxonomy more illuminating than using industry taxonomies like SIC or NAIC.
 Another notable comparison point: git – a skill claimed by by 10% of current MLEs, but only 5% of current data scientists.