Use deep learning on data you already have

[A version of this post appears on the O’Reilly Radar.]

Putting deep learning into practice with new tools, frameworks, and future developments.

Deep learning has made tremendous advances in the past year. Though managers are aware of what’s been happening in the research world, we’re still in the early days of putting that research into practice. While the resurgence in interest stems from applications in computer vision and speech, more companies can actually use deep learning on data they already have—including structured data, text, and times-series data.

All of this interest in deep learning has led to more tools and frameworks, including some that target non-experts already using other forms of machine learning (ML). Many devices will benefit from these technologies, so expect streaming applications to be infused with intelligence. Finally, there are many interesting research initiatives that point to future neural networks, with different characteristics and enhanced model-building capabilities.

Back to machine learning

If you think of deep learning as yet another machine learning method, then the essential ingredients should be familiar. Software infrastructure to deploy and maintain models remains paramount. A widely cited paper from Google uses the concept of technical debt to posit that “only a small fraction of real-world ML systems is composed of ML code.”  This means that while underlying algorithms are important, they tend to be a small component within a complex production system. As the authors point out, machine learning systems also need to address ML-specific entanglement and dependency issues involving data, features, hyperparameters, models, and model settings (they refer to this as the CACE principle: Changing Anything Changes Everything).

Deep learning has also often meant specialized hardware (often GPUs) for training models. For companies that already use SaaS tools, many of the leading cloud platforms and managed services already offer deep learning software and hardware solutions. Newer tools, like BigDL, target companies that prefer tools that integrate seamlessly with popular components like Apache Spark and leverage their existing big data clusters, model serving, and monitoring platforms.

You’ll also still need (labeled) data—in fact, you’ll need more. Deep learning specialists describe it as akin to a rocketship that needs a big engine (a model) and a lot of fuel (data) in order to go anywhere interesting. (In many cases, data already resides in clusters; thus, it makes sense that many companies are looking for solutions that run alongside their existing tools.) Clean, labeled data requires data analysts with a combination of domain knowledge, and infrastructure engineers who can design and maintain robust data processing platforms. In a recent conversation, an expert I spoke with joked that with all of the improvements in software infrastructure and machine learning models “soon, all companies will need to hire are analysts who can create good data sets.” Joking aside, the situation is a bit more nuanced. As an example, many companies are beginning to develop and deploy human-in-the-loop systems, sometimes referred to as “human-assisted AI” or “active learning systems,” that augment the work done by domain experts and data scientists.

More so than other machine learning techniques, devising and modifying deep learning models requires experience and expertise. Fortunately, many of the popular frameworks ship with example models that produce decent results for problems across a variety of data types and domains. At least initially, packaged solutions or managed services from leading cloud providers obviates the need for in-house expertise, and I suspect many companies will be able to get by with few true deep learning experts. A more sensible option is to hire data scientists with strong software engineering skills who can help deploy machine learning models to production and who understand the nuances of evaluating models.

Another common question is the nature of deep learning models. Are the generated predictions due to correlation or are we able to unearth some causal relationship? Deep learning architectures are notoriously difficult for non-experts (and even experts) to understand and explain. Popular models contain millions of parameters. The exact nature of why they excel at pattern recognition is an active research area (a recent paper found that many successful deep learning architectures excel at “sheer memorization”). Nevertheless, many companies will deploy deep learning if models significantly improve important underlying business metrics. Some applications and domains require models that are explainable, and, fortunately, there are efforts underway to make machine learning models easier to understand. Another interesting general discussion companies should engage in is on the trustworthiness of algorithms (Tim O’Reilly has a great checklist).

Looking toward narrow AI systems, much of the recent excitement involves systems that combine deep learning with additional techniques (reinforcement learning, probabilistic computing) and components (memory, knowledge, reasoning, and planning). I believe many of these AI systems will be too complex for a typical enterprise to build, and opportunities abound for companies that can build targeted solutions. For the many enterprises still grappling with deploying machine learning, a more reasonable starting point is to use deep learning alongside other algorithms. Combine your models with a bandit algorithm, and you can claim to be on your way toward reinforcement learning.

IoT and edge intelligence

While we often think of deep learning as useful for text, images, and speech, I’ve come across a few companies that are using deep learning to analyze time-series and event data. Coincidentally, some of the more exciting examples of AI involve systems and devices that generate large amounts of such data (for example, sensors in self-driving cars capture a lot more data than people realize). The volumes are such that analytic techniques will need to be used to filter, compress, and summarize data sets before they are uploaded to a large-scale (cloud) platform, where models can be trained against aggregated data. The good news is that compressed versions of models—deep learning architectures with fewer parameters—can be deployed back to the devices.

A future scenario where multitudes of intelligent devices interact with each other (time to brush up on P2P systems), reminds me of some important considerations that surfaced during recent conversations with the founders of the RISE Lab. Over time, streaming systems will need to incorporate online machine learning not just for model training, but for data processing and collection. Security and secure execution will encourage more data sharing, improved “personalization,” and unlock the value of many more data sources.

Research directions: Machines that think like people

Research in deep learning and AI proceeds at a rapid pace, and it’s challenging to keep up with developments. A recent survey paper lists the core ingredients of human intelligence and uses it as a framework to organize recent research in deep learning. I found the accompanying taxonomy useful for understanding the variety of research initiatives underway. In essence, humans “… learn from less data and generalize in richer and more flexible ways,” and AI systems that incorporate deep learning should add similar capabilities. In the process, the authors list some capabilities that may start appearing in future AI products:

  • Rapid model building via compositionality (being able to combine a set of primitives is at the core of productivity) and learning-to-learn (accelerating the learning of new tasks via transfer or multi-task learning).
  • Systems that have some ability to build causal models and are able to learn from many fewer examples (and thus lead to AI products that are easier to explain and understand).
  • Drawing inspiration from how quickly and efficiently children learn, researchers are investigating the importance of incorporating some startup knowledge, including intuitive theories of physics and psychology, in particular, to accelerate and enrich learning.
  • Researchers are working on deep learning models with access to “working memories.”

Many current systems based on deep learning require big compute, big data, and big models. While researchers are seeking to build tools that are less dependent on large-scale pattern recognition, companies wanting to use deep learning as a machine learning technique can get started using tools that integrate with their existing big data platforms.

Related resources:

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s