How to train and deploy deep learning at scale

[A version of this post appears on the O’Reilly Radar.]

The O’Reilly Data Show Podcast: Ameet Talwalkar on large-scale machine learning.

In this episode of the Data Show, I spoke with Ameet Talwalkar, assistant professor of machine learning at CMU and co-founder of Determined AI. He was an early and key contributor to Spark MLlib and a member of AMPLab. Most recently, he helped conceive and organize the first edition of SysML, a new academic conference at the intersection of systems and machine learning (ML).

We discussed using and deploying deep learning at scale. This is an empirical era for machine learning, and, as I noted in an earlier article, as successful as deep learning has been, our level of understanding of why it works so well is still lacking. In practice, machine learning engineers need to explore and experiment using different architectures and hyperparameters before they settle on a model that works for their specific use case. Training a single model usually involves big (labeled) data and big models; as such, exploring the space of possible model architectures and parameters can take days, weeks, or even months. Talwalkar has spent the last few years grappling with this problem as an academic researcher and as an entrepreneur. In this episode, he describes some of his related work on hyperparameter tuning, systems, and more.

Here are some highlights from our conversation:

Deep learning

I would say that you hear a lot about the modeling of problems associated with deep learning. How do I frame my problem as a machine learning problem? How do I pick my architecture? How do I debug things when things go wrong? … What we’ve seen in practice is that, maybe somewhat surprisingly, the biggest challenges that ML engineers face actually are due to the lack of tools and software for deep learning. These problems are sort of like hybrid systems/ML problems. Very similar to the sorts of research that came out of the AMPLab.

… Things like TensorFlow and Keras, and a lot of those other platforms that you mentioned, are great and they’re a great step forward. They’re really good at abstracting low-level details of a particular learning architecture. In five lines, you can describe how your architecture looks and then you can also specify what algorithms you want to use for training.

There are a lot of other systems challenges associated with actually going end to end, from data to a deployed model. The existing software solutions don’t really tackle a big set of these challenges. For example, regardless of the software you’re using, it takes days to weeks to train a deep learning model. There’s real open challenges of how to best use parallel and distributed computing both to train a particular model and in the context of tuning hyperparameters of different models.
Continue reading “How to train and deploy deep learning at scale”

Using machine learning to monitor and optimize chatbots

[A version of this post appears on the O’Reilly Radar.]

The O’Reilly Data Show Podcast: Ofer Ronen on the current state of chatbots.

In this episode of the Data Show, I spoke with Ofer Ronen, GM of Chatbase, a startup housed within Google’s Area 120. With tools for building chatbots becoming accessible, conversational interfaces are becoming more prevalent. As Ronen highlights in our conversation, chatbots are already enabling companies to automate many routine tasks (mainly in customer interaction). We are still in the early days of chatbots, but if current trends persist, we’ll see bots deployed more widely and take on more complex tasks and interactions. Gartner recently predicted that by 2021, companies will spend more on bots and chatbots than mobile app development.

Like any other software application, as bots get deployed in real-world applications, companies will need tools to monitor their performance. For a single, simple chatbot, one can imagine developers manually monitoring log files for errors and problems. Things get harder as you scale to more bots and as the bots get increasingly more complex. As in the case of other machine learning applications, when companies start deploying many more chatbots, automated tools for monitoring and diagnostics become essential.

The good news is relevant tools are beginning to emerge. In this episode, Ronen describes a tool he helped build: Chatbase is a chatbot analytics and optimization service that leverages machine learning research and technologies developed at Google. In essence, Chatbase lets companies focus on building and deploying the best possible chatbots.

Here are some highlights from our conversation:

Democratization of tools for bot developers

It’s been hard to get the natural language processing to work well and to recognize all the different ways people might say the same thing. There’s been an explosion of tools that leverage machine learning and natural language processing (NLP) engines to make sense of all that’s being asked of bots. But with increased capacity and capability to process data, there’s now better third-party tools for any company to take advantage of and build a decent bot out of the box.

… I see three levels of bot builders out there. There’s the non-technical kind where marketing or sales might create a prototype using a user interface—like maybe Chatfuel, which requires no programming, and create a basic experience. Or they might even create some sort of decision tree bot that is not flexible, but is good for maybe basic lead-gen experiences. But they often can’t handle type-ins. It’s often button-based. So, that’s one level, the non-technical folks.
Continue reading “Using machine learning to monitor and optimize chatbots”

Unleashing the potential of reinforcement learning

[A version of they post appears on the O’Reilly Radar.]

The O’Reilly Data Show Podcast: Danny Lange on how reinforcement learning can accelerate software development and how it can be democratized.

In this episode of the Data Show, I spoke with Danny Lange, VP of AI and machine learning at Unity Technologies. Lange previously led data and machine learning teams at Microsoft, Amazon, and Uber, where his teams were responsible for building data science tools used by other developers and analysts within those companies. When I first heard that he was moving to Unity, I was curious as to why he decided to join a company whose core product targets game developers.

As you’ll glean from our conversation, Unity is at the forefront of some of the most exciting, practical applications of deep learning (DL) and reinforcement learning (RL). Realistic scenery and imagery are critical for modern games. GANs and related semi-supervised techniques can ease content creation by enabling artists to produce realistic images much more quickly. In a previous post, Lange described how reinforcement learning opens up the possibility of training/learning rather than programming in game development.

Lange explains why simulation environments are going to be important tools for AI developers. We are still in the early days of machine intelligence, and I am looking forward to more tools that can democratize AI research (including future releases by Lange and his team at Unity).

Here are some highlights from our conversation:

Why reinforcement learning is so exciting

I’m a huge fan of reinforcement learning. I think it has incredible potential, not just in game development but in a lot of other areas, too. … What we are doing at Unity is basically making reinforcement learning available to the masses. We have shipped open source software on GitHub called Unity ML Agents, that include the basic frameworks for people to experiment with reinforcement learning. Reinforcement learning is really creating a machine learned-driven feedback loop. Recall the example I previously wrote about, of the chicken crossing the road; yes, it gets hit thousands and thousands of times by these cars, but every time it gets hit, it learns that’s a bad thing. And every time it manages to pick up a gift package on the way over the road, that’s a good thing.

Over time, it gets superhuman capabilities in crossing this road, and that is fantastic because there’s not a single line of code going into that. It’s pure simulation, and through reinforcement learning it captures a method. It learns a method to cross the road, and you can take that into many different aspects of games. There are many different methods you can train. You can add two chickens—can they collaborate to do something together? We are looking at what we call multi-agent systems, where two or more of these trained reinforcement learning-trained agents are acting together to achieve a goal.
Continue reading “Unleashing the potential of reinforcement learning”

Graphs as the front end for machine learning

[A version of this post appears on the O’Reilly Radar.]

The O’Reilly Data Show Podcast: Leo Meyerovich on building large-scale, interactive applications that enable visual investigations.

In this episode of the Data Show, I spoke with Leo Meyerovich, co-founder and CEO of Graphistry. Graphs have always been part of the big data revolution (think of the large graphs generated by the early social media startups). In recent months, I’ve come across companies releasing and using new tools for creating, storing, and (most importantly) analyzing large graphs. There are many problems and use cases that lend themselves naturally to graphs, and recent advances in hardware and software building blocks have made large-scale analytics possible.

Starting with his work as a graduate student at UC Berkeley, Meyerovich has pioneered the combination of hardware and software acceleration to create truly interactive environments for visualizing large amounts of data. Graphistry has built a suite of tools that enables analysts to wade through large data sets and investigate business and security incidents. The company is currently focused on the security domain—where it turns out that graph representations of data are things security analysts are quite familiar with.

Here are some highlights from our conversation:

Graphs as the front end for machine learning

They’re really flexible. First of all, there’s a pure analytic reason in that there are certain types of queries that one could do efficiently with a graph database. If you needed do a bunch of joins, graphs are really great at that. … Companies want to get into stuff like 360-degree views of things; they want to understand correlations to actually explain what’s going on at a more intelligent level.

… I think that’s where graphs really start to shine. Because companies deal with pretty heterogeneous data, and a graph ends up being a really easy way to deal with that. A lot of questions are basically, “What’s nearby?”—almost like your nearest neighbor type of stuff; the graph becomes, both at the query level and at the visual level, very interpretable. I now have a hypothesis about graphs as being the front end and the UI for machine learning, but that might be a topic for another day.

Continue reading “Graphs as the front end for machine learning”

Machine learning needs machine teaching

[A version of this post appears on the O’Reilly Radar.]

The O’Reilly Data Show Podcast: Mark Hammond on applications of reinforcement learning to manufacturing and industrial automation.

In this episode of the Data Show, I spoke with Mark Hammond, founder and CEO of Bonsai, a startup at the forefront of developing AI systems in industrial settings. While many articles have been written about developments in computer vision, speech recognition, and autonomous vehicles, I’m particularly excited about near-term applications of AI to manufacturing, robotics, and industrial automation. In a recent post, I outlined practical applications of reinforcement learning (RL)—a type of machine learning now being used in AI systems. In particular, I described how companies like Bonsai are applying RL to manufacturing and industrial automation. As researchers explore new approaches for solving RL problems, I expect many of the first applications to be in industrial automation.

Here are some highlights from our conversation:

Machine learning and machine teaching

Everyone is so focused on making better and faster learning algorithms; what do we do when we have it? Let’s just suppose that you now have an algorithm that can learn as well as or better than humans. How do we use that, how do we apply that in a predictable, scalable, repeatable way toward the objectives that we want to apply it toward?

… I thought about that for a while, and it’s one of those things where the answer is obvious in hindsight, but until you sit down and really chew on it, it doesn’t jump out at you. And it’s that, by design, if you’re building a learning system—if you want to program it—you have to teach it. Machine teaching and machine learning are necessary complements to one another; you need both. And for the large part, most of what comprises machine teaching these days consists of giant label data sets.

… You need machine teaching and machine learning. It dawned on me that this was the core abstraction that was going to make it possible for us to start applying all of this stuff more broadly across all the myriad use cases that we see in the real world without having to turn all of the people who are looking to use it into experts in machine learning and data science. It’s what enabled me to realize what Bonsai’s mission is: to enable your subject matter experts (a chemical engineer or a mechanical engineer, someone who is very, very well versed in whatever their domain is but not necessarily in machine learning or data science) to take that expertise and use it as the foundation for describing what to teach and then automating the underlying pieces for how you can actually effectively learn that.

Continue reading “Machine learning needs machine teaching”

How machine learning can be used to write more secure computer programs

[A version of this post appears on the O’Reilly Radar.]

The O’Reilly Data Show Podcast: Fabian Yamaguchi on the potential of using large-scale analytics on graph representations of code.

In this episode of the Data Show, I spoke with Fabian Yamaguchi, chief scientist at ShiftLeft. His 2015 Ph.D. dissertation sketched out how the combination of static analysis, graph mining, and machine learning, can be used to develop tools to augment security analysts. In a recent post, I argued for machine learning tools to augment teams responsible for deploying and managing models in production (machine learning engineers). These are part of a general trend of using machine learning to develop and manage the software systems of tomorrow. Yamaguchi’s work is step one in this direction: using machine learning to reduce the number of security vulnerabilities in complex software products.

Here are some highlights from our conversation:
Continue reading “How machine learning can be used to write more secure computer programs”

Bringing AI into the enterprise

[A version of this post appears on the O’Reilly Radar.]

The O’Reilly Data Show Podcast: Kris Hammond on business applications of AI technologies and educating future AI specialists.

In this episode of the Data Show, I spoke with Kristian Hammond, chief scientist of Narrative Science and professor of EECS at Northwestern University. He has been at the forefront of helping companies understand the power, limitations, and disruptive potential of AI technologies and tools. In a previous post on machine learning, I listed types of uses cases (a taxonomy) for machine learning that could just as well apply to enterprise applications of AI. But how do you identify good use cases to begin with?

A good place to start for most companies is by looking for AI technologies that can help automate routine tasks, particularly low-skill tasks that occupy the time of high-skilled workers. An initial list of candidate tasks can be gathered by applying the following series of simple questions:

  • Is the task data-driven?
  • Do you have the data to support the automation of the task?
  • Do you really need the scale that automation can provide?

Continue reading “Bringing AI into the enterprise”