Assessing progress in automation technologies

[A version of this post appears on the O’Reilly Radar.]

When it comes to automation of existing tasks and workflows, you need not adopt an “all or nothing” attitude.

In this post, I share slides and notes from a keynote Roger Chen and I gave at the Artificial Intelligence conference in London in October 2018. We presented an overview of the state of automation technologies: we tried to highlight the state of the key building block technologies and we described how these tools might evolve in the near future.

To assess the state of adoption of machine learning (ML) and AI, we recently conducted a survey that garnered more than 11,000 respondents. As I pointed out in previous posts, we learned many companies are still in the early stages of deploying machine learning:

Companies cite “lack of data” and “lack of skilled people” as the main factors holding back adoption. In many instances, “lack of data” is literally the state of affairs: companies have yet to collect and store the data needed to train the ML models they desire. The “skills gap” is real and persistent. Developers have taken heed of this growth in demand. In our own online learning platform, we are seeing strong growth in usage of content across AI topics, including 77% growth in consumption of content pertaining to deep learning:
Continue reading “Assessing progress in automation technologies”

Managing risk in machine learning

[A version of this post appears on the O’Reilly Radar.]

Considerations for a world where ML models are becoming mission critical.

In this post, I share slides and notes from a keynote I gave at the Strata Data Conference in New York last September. As the data community begins to deploy more machine learning (ML) models, I wanted to review some important considerations.

Let’s begin by looking at the state of adoption. We recently conducted a surveywhich garnered more than 11,000 respondents—our main goal was to ascertain how enterprises were using machine learning. One of the things we learned was that many companies are still in the early stages of deploying machine learning (ML):

As far as reasons for companies holding back, we found from a survey we conducted earlier this year that companies cited lack of skilled people, a “skills gap,” as the main challenge holding back adoption.

Interest on the part of companies means the demand side for “machine learning talent” is healthy. Developers have taken notice and are beginning to learn about ML. In our own online training platform (which has more than 2.1 million users), we’re finding strong interest in machine learning topics. Below are the top search topics on our training platform:
Continue reading “Managing risk in machine learning”

Data collection and data markets in the age of privacy and machine learning

[A version of this post appears on the O’Reilly Radar.]

While models and algorithms garner most of the media coverage, this is a great time to be thinking about building tools in data.

In this post I share slides and notes from a keynote I gave at the Strata Data Conference in London at the end of May. My goal was to remind the data community about the many interesting opportunities and challenges in data itself. Much of the focus of recent press coverage has been on algorithms and models, specifically the expanding utility of deep learning. Because large deep learning architectures are quite data hungry, the importance of data has grown even more. In this short talk, I describe some interesting trends in how data is valued, collected, and shared.

Economic value of data

It’s no secret that companies place a lot of value on data and the data pipelines that produce key features. In the early phases of adopting machine learning (ML), companies focus on making sure they have sufficient amount of labeled (training) data for the applications they want to tackle. They then investigate additional data sources that they can use to augment their existing data. In fact, among many practitioners, data remains more valuable than models (many talk openly about what models they use, but are reticent to discuss the features they feed into those models).

But if data is precious, how do we go about estimating its value? For those among us who build machine learning models, we can estimate the value of data by examining the cost of acquiring training data:
Continue reading “Data collection and data markets in the age of privacy and machine learning”

Understanding automation

[A version of this post appears on the O’Reilly Radar.]

An overview and framework, including tools that can be used to enable automation.

In this post, I share slides and notes from a talk Roger Chen and I gave in May 2018 at the Artificial Intelligence Conference in New York. Most companies are beginning to explore how to use machine learning and AI, and we wanted to give an overview and framework for how to think about these technologies and their roles in automation. Along the way, we describe the machine learning and AI tools that can be used to enable automation.

Let me begin by citing a recent survey we conducted: among other things, we found that a majority (54%) consider deep learning an important part of their future projects. Deep learning is a specific machine learning technique, and its success in a variety of domains has led to the renewed interest in AI.

Much of the current media coverage about AI revolves around deep learning. The reality is that many AI systems will use many different machine learning methods and techniques. For example, recent prominent examples of AI systems—systems that excelled at Go and Poker—used deep learning and other methods. In the case of AlphaGo, Monte Carlo Tree Search played a role, whereas DeepStack’s poker playing system combines neural networks with counterfactual regret minimization and heuristic search.

Continue reading “Understanding automation”

How to build analytic products in an age when data privacy has become critical

[A version of this post appears on the O’Reilly Radar.]

Privacy-preserving analytics is not only possible, but with GDPR about to come online, it will become necessary to incorporate privacy in your data products.

In this post, I share slides and notes from a talk I gave in March 2018 at the Strata Data Conference in California, offering suggestions for how companies may want to build analytic products in an age when data privacy has become critical. A lot has changed since I gave this presentation: numerous articles have been written about Facebook’s privacy policies, its CEO testified twice before the U.S. Congress, and I deactivated my mostly dormant Facebook account. The end result being that there’s even a more heightened awareness around data privacy, and people are acknowledging that problems go beyond a few companies or a few people.

Let me start by listing a few observations regarding data privacy:

Which brings me to the main topic of this presentation: how do we build analytic services and products in an age when data privacy has emerged as an important issue? Architecting and building data platforms is central to what many of us do. We have long recognized that data security and data privacy are required features for our data platforms, but how do we “lock down” analytics?

Once we have data securely in place, we proceed to utilize it in two main ways: (1) to make better decisions (BI) and (2) to enable some form of automation (ML). It turns out there are some new tools for building analytic products that preserve privacy. Let me give a quick overview of a few things you may want to try today.
Continue reading “How to build analytic products in an age when data privacy has become critical”

Responsible deployment of machine learning

[A version of this post appears on the O’Reilly Radar.]

We need to build machine learning tools to augment our machine learning engineers.

In this post, I share slides and notes from a talk I gave in December 2017 at the Strata Data Conference in Singapore offering suggestions to companies that are actively deploying products infused with machine learning capabilities. Over the past few years, the data community has focused on infrastructure and platforms for data collection, including robust pipelines and highly scalable storage systems for analytics. According to a recent LinkedIn report, the top two emerging jobs are “machine learning engineer” and “data scientist.” Companies are starting to staff to put their data infrastructures to work, and machine learning is going become more prevalent in the years to come.


As more companies start using machine learning in products, tools, and business processes, let’s take a quick tour of model building, model deployment, and model management. It turns out that once a model is built, deploying and managing it in production requires engineering skills. So much so that earlier this year, we noted that companies have created a new job role—machine learning (or deep learning) engineer—for people tasked with productionizing machine learning models.

Modern machine learning libraries and tools like notebooks have made model building simpler. New data scientists need to make sure they understand the business problem and optimize their models for it. In a diverse region like Southeast Asia, models need to be localized, as conditions and contexts differ across countries in the ASEAN.
Continue reading “Responsible deployment of machine learning”

The state of AI adoption

[A version of this post appears on the O’Reilly Radar.]

An overview of adoption, and suggestions to companies interested in AI technologies.

Artificial intelligence (AI) has attracted a lot of media coverage recently, and companies are rushing to figure out how AI technologies will impact them. Much of the coverage is devoted to research breakthroughs or new product offerings. But how are companies integrating AI into their underlying businesses? In this post, we share slides and notes from a talk we gave this past September at the AI Conference in San Francisco, offering an overview of the state of adoption and some suggestions to companies interested in implementing AI technologies.


Slide courtesy of Ben Lorica. Data source: Google Trends

Much of the renewed interest in AI can be attributed to deep learning. Breakthroughs in deep learning (particularly as applied to computer vision and speech) have excited people about the possibilities of modern AI applications. The result is that companies are beginning to examine applications of deep learning to data they are familiar with, while considering data types (such as images, audio, video) of which they have yet to take advantage.
Continue reading “The state of AI adoption”