The key to building deep learning solutions for large enterprises

[A version of this post appears on the O’Reilly Radar.]

The O’Reilly Data Show Podcast: Adam Gibson on the importance of ROI, integration, and the JVM.

Subscribe to the O’Reilly Data Show Podcast to explore the opportunities and techniques driving big data, data science, and AI. Find us on Stitcher, TuneIn, iTunes, SoundCloud, RSS.

As data scientists add deep learning to their arsenals, they need tools that integrate with existing platforms and frameworks. This is particularly important for those who work in large enterprises. In this episode of the Data Show, I spoke with Adam Gibson, co-founder and CTO of Skymind, and co-creator of Deeplearning4J (DL4J). Gibson has spent the last few years developing the DL4J library and community, while simultaneously building deep learning solutions and products for large enterprises.

Here are some highlights:

Continue reading

Use deep learning on data you already have

[A version of this post appears on the O’Reilly Radar.]

Putting deep learning into practice with new tools, frameworks, and future developments.

Deep learning has made tremendous advances in the past year. Though managers are aware of what’s been happening in the research world, we’re still in the early days of putting that research into practice. While the resurgence in interest stems from applications in computer vision and speech, more companies can actually use deep learning on data they already have—including structured data, text, and times-series data.

All of this interest in deep learning has led to more tools and frameworks, including some that target non-experts already using other forms of machine learning (ML). Many devices will benefit from these technologies, so expect streaming applications to be infused with intelligence. Finally, there are many interesting research initiatives that point to future neural networks, with different characteristics and enhanced model-building capabilities.

Back to machine learning

If you think of deep learning as yet another machine learning method, then the essential ingredients should be familiar. Software infrastructure to deploy and maintain models remains paramount. A widely cited paper from Google uses the concept of technical debt to posit that “only a small fraction of real-world ML systems is composed of ML code.”  This means that while underlying algorithms are important, they tend to be a small component within a complex production system. As the authors point out, machine learning systems also need to address ML-specific entanglement and dependency issues involving data, features, hyperparameters, models, and model settings (they refer to this as the CACE principle: Changing Anything Changes Everything).
Continue reading

How big compute is powering the deep learning rocketship

[A version of this post appears on the O’Reilly Radar.]

The O’Reilly Data Show Podcast: Greg Diamos on building computer systems for deep learning and AI.

Subscribe to the O’Reilly Data Show Podcast to explore the opportunities and techniques driving big data, data science, and AI. Find us on Stitcher, TuneIn, iTunes, SoundCloud, RSS.

Specialists describe deep learning as akin to a rocketship that needs a really big engine (a model) and a lot of fuel (the data) in order to go anywhere interesting. To get a better understanding of the issues involved in building compute systems for deep learning, I spoke with one of the foremost experts on this subject: Greg Diamos, senior researcher at Baidu. Diamos has long worked to combine advances in software and hardware to make computers run faster. In recent years, he has focused on scaling deep learning to help advance the state-of-the-art in areas like speech recognition.

A big model, combined with big data, necessitates big compute—and at least at the bleeding edge of AI, researchers have gravitated toward high-performance computing (HPC) or supercomputer-like systems. Most practitioners use systems with multiple GPUs (ASICs or FPGAs) and software libraries that make it easy to run fast deep learning models on top of them.

In keeping with the convenience versus performance tradeoff discussions that play out in many enterprises, there are other efforts that fall more in the big data, rather than HPC, camp. In upcoming posts, I’ll highlight groups of engineers and data scientists who are starting to use these techniques and are creating software to run them on existing software and hardware infrastructure common in the big data community.

Continue reading