Assessing progress in automation technologies

[A version of this post appears on the O’Reilly Radar.]

When it comes to automation of existing tasks and workflows, you need not adopt an “all or nothing” attitude.

In this post, I share slides and notes from a keynote Roger Chen and I gave at the Artificial Intelligence conference in London in October 2018. We presented an overview of the state of automation technologies: we tried to highlight the state of the key building block technologies and we described how these tools might evolve in the near future.

To assess the state of adoption of machine learning (ML) and AI, we recently conducted a survey that garnered more than 11,000 respondents. As I pointed out in previous posts, we learned many companies are still in the early stages of deploying machine learning:

Companies cite “lack of data” and “lack of skilled people” as the main factors holding back adoption. In many instances, “lack of data” is literally the state of affairs: companies have yet to collect and store the data needed to train the ML models they desire. The “skills gap” is real and persistent. Developers have taken heed of this growth in demand. In our own online learning platform, we are seeing strong growth in usage of content across AI topics, including 77% growth in consumption of content pertaining to deep learning:
Continue reading “Assessing progress in automation technologies”

Tools for generating deep neural networks with efficient network architectures

[A version of this post appears on the O’Reilly Radar.]

The O’Reilly Data Show Podcast: Alex Wong on building human-in-the-loop automation solutions for enterprise machine learning.

In this episode of the Data Show, I spoke with Alex Wong, associate professor at the University of Waterloo, and co-founder of DarwinAI, a startup that uses AI to address foundational challenges with deep learning in the enterprise. As the use of machine learning and analytics become more widespread, we’re beginning to see tools that enable data scientists and data engineers to scale and tackle many more problems and maintain more systems. This includes automation tools for the many stages involved in data science, including data preparation, feature engineering, model selection, and hyperparameter tuning, as well as tools for data engineering and data operations.

Wong and his collaborators are building solutions for enterprises, including tools for generating efficient neural networks and for the performance analysis of networks deployed to edge devices.

Here are some highlights from our conversation:

Using AI to democratize deep learning

Having worked in machine learning and deep learning for more than a decade, both in academia as well as industry, it really became very evident to me that there’s a significant barrier to widespread adoption. One of the main things is that it is very difficult to design, build, and explain deep neural networks. I especially wanted to meet operational requirements. The process just involves way too much guesswork, trial and error, so it’s hard to build systems that work in real-world industrial systems.
Continue reading “Tools for generating deep neural networks with efficient network architectures”

Building tools for enterprise data science

[A version of this post appears on the O’Reilly Radar.]

The O’Reilly Data Show Podcast: Vitaly Gordon on the rise of automation tools in data science.

In this episode of the Data Show, I spoke with Vitaly Gordon, VP of data science and engineering at Salesforce. As the use of machine learning becomes more widespread, we need tools that will allow data scientists to scale so they can tackle many more problems and help many more people. We need automation tools for the many stages involved in data science, including data preparation, feature engineering, model selection and hyperparameter tuning, as well as monitoring.

I wanted the perspective of someone who is already faced with having to support many models in production. The proliferation of models is still a theoretical consideration for many data science teams, but Gordon and his colleagues at Salesforce already support hundreds of thousands of customers who need custom models built on custom data. They recently took their learnings public and open sourced TransmogrifAI, a library for automated machine learning for structured data, which sits on top of Apache Spark.

Here are some highlights from our conversation:
Continue reading “Building tools for enterprise data science”

Managing risk in machine learning

[A version of this post appears on the O’Reilly Radar.]

Considerations for a world where ML models are becoming mission critical.

In this post, I share slides and notes from a keynote I gave at the Strata Data Conference in New York last September. As the data community begins to deploy more machine learning (ML) models, I wanted to review some important considerations.

Let’s begin by looking at the state of adoption. We recently conducted a surveywhich garnered more than 11,000 respondents—our main goal was to ascertain how enterprises were using machine learning. One of the things we learned was that many companies are still in the early stages of deploying machine learning (ML):

As far as reasons for companies holding back, we found from a survey we conducted earlier this year that companies cited lack of skilled people, a “skills gap,” as the main challenge holding back adoption.

Interest on the part of companies means the demand side for “machine learning talent” is healthy. Developers have taken notice and are beginning to learn about ML. In our own online training platform (which has more than 2.1 million users), we’re finding strong interest in machine learning topics. Below are the top search topics on our training platform:
Continue reading “Managing risk in machine learning”

Lessons learned while helping enterprises adopt machine learning

[A version of this post appears on the O’Reilly Radar blog.]

The O’Reilly Data Show Podcast: Francesca Lazzeri and Jaya Mathew on digital transformation, culture and organization, and the team data science process.

In this episode of the Data Show, I spoke with Francesca Lazzeri, an AI and machine learning scientist at Microsoft, and her colleague Jaya Mathew, a senior data scientist at Microsoft. We conducted a couple of surveys this year—“How Companies Are Putting AI to Work Through Deep Learning” and “The State of Machine Learning Adoption in the Enterprise” — and we found that while many companies are still in the early stages of machine learning adoption, there’s considerable interest in moving forward with projects in the near future. Lazzeri and Mathew spend a considerable amount of time interacting with companies that are beginning to use machine learning and have experiences that span many different industries and applications. I wanted to learn some of the processes and tools they use when they assist companies in beginning their machine learning journeys.

Here are some highlights from our conversation:

Team data science process

Francesca Lazzeri: The Data Science Process is a framework that we try to apply in our projects. Everything begins with a business problem, so external customers come to us with a business problem or a process they want to optimize. We work with them to translate these into realistic questions, into what we call data science questions. And then we move to the data portion: what are the different relevant data sources, is the data internal or external? After that, you try to define the data pipeline. We start with the core part of the data science process—that is, data cleaning—and proceed to feature engineering, model building, and model deployment and management.
Continue reading “Lessons learned while helping enterprises adopt machine learning”

Machine learning on encrypted data

[A version of this post appears on the O’Reilly Radar.]

The O’Reilly Data Show Podcast: Alon Kaufman on the interplay between machine learning, encryption, and security.

In this episode of the Data Show, I spoke with Alon Kaufman, CEO and co-founder of Duality Technologies, a startup building tools that will allow companies to apply analytics and machine learning to encrypted data. In a recent talk, I described the importance of data, various methods for estimating the value of data, and emerging tools for incentivizing data sharing across organizations. As I noted, the main motivation for improving data liquidity is the growing importance of machine learning. We’re all familiar with the importance of data security and privacy, but probably not as many people are aware of the emerging set of tools at the intersection of machine learning and security. Kaufman and his stellar roster of co-founders are doing some of the most interesting work in this area.

Here are some highlights from our conversation:

Running machine learning models on encrypted data

Four or five years ago, techniques for running machine learning models on data while it’s encrypted were being discussed in the academic world. We did a few trials of this and although the results were fascinating, it still wasn’t practical.

… There have been big breakthroughs that have led to it becoming feasible. A few years ago, it was more theoretical. Now it’s becoming feasible. This is the right time to build a company. Not only because of the technology feasibility but definitely because of the need in the market.

Continue reading “Machine learning on encrypted data”

How social science research can inform the design of AI systems

[A version of this post appears on the O’Reilly Radar.]

The O’Reilly Data Show Podcast: Jacob Ward on the interplay between psychology, decision-making, and AI systems.

In this episode of the Data Show, I spoke with Jacob Ward, a Berggruen Fellow at Stanford University. Ward has an extensive background in journalism, mainly covering topics in science and technology, at National Geographic, Al Jazeera, Discovery Channel, BBC, Popular Science, and many other outlets. Most recently, he’s become interested in the interplay between research in psychology, decision-making, and AI systems. He’s in the process of writing a book on these topics, and was gracious enough to give an informal preview by way of this podcast conversation.

Here are some highlights from our conversation:

Psychology and AI

I began to realize there was a disconnect between what is a totally revolutionary set of innovations coming through in psychology right now that are really just beginning to scratch the surface of how human beings make decisions; at the same time, we are beginning to automate human decision-making in a really fundamental way. I had a number of different people say, ‘Wow, what you’re describing in psychology really reminds me of this piece of AI that I’m building right now,’ to change how expectant mothers see their doctors or change how we hire somebody for a job or whatever it is.

Transparency and designing systems that are fair

I was talking to somebody the other day who was trying to build a loan company that was using machine learning to present loans to people. He and his company did everything they possibly could to not redline the people they were loaning to. They were trying very hard not to make unfair loans that would give preference to white people over people of color.

They went to extraordinary lengths to make that happen. They cut addresses out of the process. They did all of this stuff to try to basically neutralize the process, and the machine learning model still would pick white people at a disproportionate rate over everybody else. They can’t explain why. They don’t know why that is. There’s some variable that’s mapping to race that they just don’t know about.

But that sort of opacity—this is somebody explaining it to me who just happened to have been inside the company, but it’s not as if that’s on display for everybody to check out. These kinds of closed systems are picking up patterns we can’t explain, and that their creators can’t explain. They are also making really, really important decisions based on them. I think it is going to be very important to change how we inspect these systems before we begin trusting them.

Continue reading “How social science research can inform the design of AI systems”