Managing risk in machine learning models

[A version of this post appears on the O’Reilly Radar.]

The O’Reilly Data Show Podcast: Andrew Burt and Steven Touw on how companies can manage models they cannot fully explain.

In this episode of the Data Show, I spoke with Andrew Burt, chief privacy officer at Immuta, and Steven Touw, co-founder and CTO of Immuta. Burt recently co-authored a white paper on managing risk in machine learning models, and I wanted to sit down with them to discuss some of the proposals they put forward to organizations that are deploying machine learning.

Some high-profile examples of models gone awry have raised awareness among companies for the need for better risk management tools and processes. There is now a growing interest in ethics among data scientists, specifically in tools for monitoring bias in machine learning models. In a previous post, I listed some of the key considerations organization should keep in mind as they move models to production, but the report co-authored by Burt goes far beyond and recommends lines of defense, including a description of key roles that are needed.

Here are some highlights from our conversation:

Privacy and compliance meet data science

Andrew Burt:I would say the big takeaway from our paper is that lawyers and compliance and privacy folks live in one world and data scientists live in another with competing objectives. And that can no longer be the case. They need to talk to each other. They need to have a shared process and some shared terminology so that everybody can communicate.

Continue reading “Managing risk in machine learning models”

Understanding automation

[A version of this post appears on the O’Reilly Radar.]

An overview and framework, including tools that can be used to enable automation.

In this post, I share slides and notes from a talk Roger Chen and I gave in May 2018 at the Artificial Intelligence Conference in New York. Most companies are beginning to explore how to use machine learning and AI, and we wanted to give an overview and framework for how to think about these technologies and their roles in automation. Along the way, we describe the machine learning and AI tools that can be used to enable automation.

Let me begin by citing a recent survey we conducted: among other things, we found that a majority (54%) consider deep learning an important part of their future projects. Deep learning is a specific machine learning technique, and its success in a variety of domains has led to the renewed interest in AI.

Much of the current media coverage about AI revolves around deep learning. The reality is that many AI systems will use many different machine learning methods and techniques. For example, recent prominent examples of AI systems—systems that excelled at Go and Poker—used deep learning and other methods. In the case of AlphaGo, Monte Carlo Tree Search played a role, whereas DeepStack’s poker playing system combines neural networks with counterfactual regret minimization and heuristic search.

Continue reading “Understanding automation”

The real value of data requires a holistic view of the end-to-end data pipeline

[A version of this post appears on the O’Reilly Radar.]

The O’Reilly Data Show Podcast: Ashok Srivastava on the emergence of machine learning and AI for enterprise applications.

In this episode of the Data Show, I spoke with Ashok Srivastava, senior vice president and chief data officer at Intuit. He has a strong science and engineering background, combined with years of applying machine learning and data science in industry. Prior to joining Intuit, he led the teams responsible for data and artificial intelligence products at Verizon. I wanted his perspective on a range of issues, including the role of the chief data officer, ethics in machine learning, and the emergence of AI technologies for enterprise products and applications.

Here are some highlights from our conversation:

Chief data officer

A chief data officer, in my opinion, is a person who thinks about the end-to-end process of obtaining data, data governance, and transforming that data for a useful purpose. His or her purview is relatively large. I view my purview at Intuit to be exactly that, thinking about the entire data pipeline, proper stewardship, proper governance principles, and proper application of data. I think that as the public learns more about the opportunities that can come from data, there’s a lot of excitement about the potential value that can be unlocked from it from the consumer standpoint, and also many businesses and scientific organizations are excited about the same thing. I think the CDO plays a role as a catalyst in making those things happen with the right principles applied.

I would say if you look back into history a little bit, you’ll find the need for the chief data officer started to come into play when people saw a huge amount of data coming in at high speeds with high variety and variability—but then also the opportunity to marry that data with real algorithms that can have a transformational property to them. While it’s true that CIOs, CTOs, and people who are in lines of business can and should think about this, it’s a complex enough process that I think it merits having a person and an organization think about that end-to-end pipeline.

Ethics

We’re actually right now in the process of launching a unified training program in data science that includes ethics as well as many other technical topics. I should say that I joined Intuit only about six months ago. They already had training programs happening worldwide in the area of data science and acquainting people with the principles necessary to use data properly as well as the technical aspects of doing it.
Continue reading “The real value of data requires a holistic view of the end-to-end data pipeline”