Ratio of Data Scientists to Data Engineers

As companies get more proficient in using data and AI to drive decision making and operations, team members with disparate backgrounds – analysts, product mangers, decision makers – begin using data on a regular basis. But when they’re first starting out, the requisite data may not be in place, and data processing and analysis tend to be left to data scientists and data engineers. At that early stage, a data platform that is accessible to less technical users may still be under development.

A **fun** topic of discussion among leaders of data teams is the ratio between the number of data scientists and data engineers. There is no ideal answer. It really depends on the tools and infrastructure you have in place, the maturity and availability of use cases for data and AI, and how you exactly define specific roles and titles. Usage of the title “data scientist” varies widely. Some companies have data scientists who are essentially business analysts capable of running adhoc queries (SQL) and advanced analytics (using some GUI based tool). At the other extreme are companies who employ data scientists who routinely write production code and deploy data pipelines and machine learning models. To add to the confusion, some companies even use the same job title – “data scientist” – for different sets of employees who closely resemble the two very different examples I just laid out!

With that said, the ratio of data scientists to data engineers may still be a useful indicator to gauge the level of engagement and maturity of a data team. As a data team grows and their tools improve, data engineers are able to “support” more data scientists, and those data scientists are empowered to do more on their own. In the chart1 below, smaller data teams (45 members or less) have on average about the same number of data scientists and data engineers. As a data team grows in size – a likely sign that a company has invested in better tools and processes – the ratio shifts to about two data scientists per data engineer.

Figure 1: Ratio of data scientists to data engineers (interquartile range: median; band represents the 1st and 3rd quartiles). Limited to teams with at least 10 data scientists and 10 data engineers. Data from Diffbot.

The chart above is based on 330+ companies with data teams that have at least ten data scientists and ten data engineers. My data source (the Diffbot Knowledge Graph) also assigns companies to industries (Diffbot can assign a company to multiple relevant sectors/industries). In the chart below I looked at the ratio of data scientists to data engineers, in sectors that had at least ten companies (sourced from my original list of 330+ companies with data teams of non-trivial size). Here’s how some large data teams are approaching this topic:

Figure 2: Ratio of data scientists to data engineers (interquartile range: median; band represents the 1st and 3rd quartiles). Limited to teams with at least 10 data scientists and 10 data engineers. Includes only sectors with at least 10 companies that meet this data team size requirements. Data from Diffbot.
  • software companies: Over 90 companies in our sample; Microsoft had the highest ratio (over 6:1 data scientists to data engineers).
  • consumer service companies: Over 60 companies in our sample; Uber had the highest ratio (over 8:1 data scientists to data engineers). Uber’s Engineering team has pushed out numerous popular open source projects in data and machine learning .
  • enterprise software companies: Over 20 companies in our sample; SAP had the highest ratio (over 14:1 data scientists to data engineers).
  • SaaS companies: 10 companies in our sample; Shopify had the highest ratio (close to 7:1 data scientists to data engineers). As per my recent conversation with Azeem Ahmed (Director of Engineering), they are making great progress on modernizing their machine learning platform.

Companies with well-regarded data teams (highlighted in Figure 2) have six or more data scientists per data engineer. While this seems like a reasonable ratio to shoot for, the answer really still comes down to the tools and infrastructure you have in place, and demand (as measured by the maturity of and availability of data and AI use cases within your organization). With the emergence of low-code and no-code tools for data science and machine learning tasks, I expect this ratio to rise. Data engineers and platform teams will need to support an ever growing number of technical and non-technical users.

Related content:


FREE Report:

Download


[1] A major caveat is that our data only includes people who use some variation of these job titles (in their profile) to describe their current position → “data scientist” or “data engineer”. For instance some data engineers may prefer to use other job titles (e.g., “developer”, “platform engineer”, software engineer”).

[2] Featured Image → Ratio of [data scientists] to [data engineers]: select technology companies

%d bloggers like this: