[A version of this post appears on the O’Reilly Radar.]
The O’Reilly Data Show Podcast: Harish Doddi on accelerating the path from prototype to production.
In this episode of the Data Show, I spoke with Harish Doddi, co-founder and CEO of Datatron, a startup focused on helping companies deploy and manage machine learning models. As companies move from machine learning prototypes to products and services, tools and best practices for productionizing and managing models are just starting to emerge. Today’s data science and data engineering teams work with a variety of machine learning libraries, data ingestion, and data storage technologies. Risk and compliance considerations mean that the ability to reproduce machine learning workflows is essential to meet audits in certain application domains. And as data science and data engineering teams continue to expand, tools need to enable and facilitate collaboration.
As someone who specializes in helping teams turn machine learning prototypes into production-ready services, I wanted to hear what Doddi has learned while working with organizations that aspire to “become machine learning companies.”
Here are some highlights from our conversation:
A central platform for building, deploying, and managing machine learning models
In one of the companies where I worked, we had built infrastructure related to Spark. We were a heavy Spark shop. So we built everything around Spark and other components. But later, when that organization grew, a lot of people came from a TensorFlow background. That suddenly created a little bit of frustration in the team because everybody wanted to move to TensorFlow. But we had invested a lot of time, effort and energy in building the infrastructure for Spark.
… We suddenly had hidden technical debt that needed to be addressed. … Let’s say right now you have two models running in production and you know that in the next two or three years you are going to deploy 20 to 30 models. You need to start thinking about this ahead of time.
Continue reading “Simplifying machine learning lifecycle management”