[A version of this appears on the O’Reilly Radar.]
A new role focused on creating data products and making data science work in production.
by Ben Lorica and Mike Loukides
We’ve been talking about data science and data scientists for a decade now. While there’s always been some debate over what “data scientist” means, we’ve reached the point where many universities, online academies, and bootcamps offer data science programs: master’s degrees, certifications, you name it. The world was a simpler place when we only had statistics. But simplicity isn’t always healthy, and the diversity of data science programs demonstrates nothing if not the demand for data scientists.
As the field of data science has developed, any number of poorly distinguished specialties have emerged. Companies use the terms “data scientist” and “data science team” to describe a variety of roles, including:
- individuals who carry out ad hoc analysis and reporting (including BI and business analytics)
- people who are responsible for statistical analysis and modeling, which, in many cases, involves formal experiments and tests
- machine learning modelers who increasingly develop prototypes using notebooks
And that listing doesn’t include the people DJ Patil and Jeff Hammerbacher were thinking of when they coined the term “data scientist”: the people who are building products from data. These data scientists are most similar to the machine learning modelers, except that they’re building something: they’re product-centric, rather than researchers. They typically work across large portions of data products. Whatever the role, data scientists aren’t just statisticians; they frequently have doctorates in the sciences, with a lot of practical experience working with data at scale. They are almost always strong programmers, not just specialists in R or some other statistical package. They understand data ingestion, data cleaning, prototyping, bringing prototypes to production, product design, setting up and managing data infrastructure, and much more. In practice, they turn out to be the archetypal Silicon Valley “unicorns”: rare and very hard to hire.
What’s important isn’t that we have well-defined specialties; in a thriving field, there will always be huge gray areas. What made “data science” so powerful was the realization that there was more to data than actuarial statistics, business intelligence, and data warehousing. Breaking down the silos that separated data people from the rest of the organization—software development, marketing, management, HR—is what made data science distinct. Its core concept was that data was applicable to everything. The data scientist’s mandate was to gather, and put to use, all the data. No department went untouched.
Continue reading “What are machine learning engineers?”