Deep learning that's easy to implement and easy to scale

[A version of this post appears on the O’Reilly Radar.]

The O’Reilly Data Show Podcast: Anima Anandkumar on MXNet, tensor computations and deep learning, and techniques for scaling algorithms.

Subscribe to the O’Reilly Data Show Podcast to explore the opportunities and techniques driving big data, data science, and AI. Find us on Stitcher, TuneIn, iTunes, SoundCloud, RSS.

In this episode of the Data Show, I spoke with Anima Anandkumar, a leading machine learning researcher, and currently a principal research scientist at Amazon. I took the opportunity to get an update on the latest developments on the use of tensors in machine learning. Most of our conversation centered around MXNet—an open source, efficient, scalable deep learning framework. I’ve been a fan of MXNet dating back to when it was a research project out of CMU and UW, and I wanted to hear Anandkumar’s perspective on its recent progress as a framework for enterprises and practicing data scientists.

Here are some highlights from our conversation:

MXNet: An efficient, fast, and easy-to-use framework for deep learning

MXNet ships with many popular deep learning architectures that have been predefined, and optimized to a great degree. If you look at benchmarks, and I’ll be showing them at Strata, you get 90% efficiency on multiple GPUs, multiple instances. These scale up much better than the other packages. The idea is if you are enabling deep learning on the cloud, efficiency becomes a very important criterion and will result in huge cost savings to the customer.

In addition, MXNet is much easier to program in terms of giving users more flexibility. There are a range of different front-end languages the user can employ and still get the same performance. … For instance in addition to Python, you can code in R, or even Javascript if you want to run this on the browser.

… At the same time, there is also the mixed programming paradigm, which means you can have both declarative and imperative programming. The idea is you need declarative programming if you want to do optimizations because you need the computation graph to figure out how and where to do the optimizations. On the other hand, imperative programming is easier to write, easier to debug, easier for the programmer to think sequentially. Because both options are available, the user can decide what is best to suit their needs, and which part of the program will require optimization and which parts are amenable as imperative programs.

In the benchmarks that I’ll show, it’s not just about multiple GPUs on the same machine, but also multiple different instances. MXNet has parameter servers in the back end, which allows it to seamlessly distribute across either multiple GPUs or multiple machines.

Tensor computations, deep learning, and hardware

On one hand, if you think about the tensor operations, what we call tensor contractions are extensions of matrix products. And if you look into deep learning computations, they involve tensor contractions. It becomes very important, then, to ask if you can beyond the usual matrix computations and be able to efficiently parallelize along different hardware architectures. For instance, if you think about BLAS operations, the BLAS Level 1 are just scalar operations. BLAS Level 2 are matrix, vector operations. If you go to BLAS Level 3, you are looking at matrix, matrix operations. By going to higher level BLAS, you’re able to block operations together and get better efficiency. If you go to tensors, which are extensions of the matrices, you need the higher level BLAS operations.

In a recent paper, we defined such extensions to BLAS, which have been added to cuBLAS 8.0. To me, this is an exciting research area: how can we enable hardware optimizations for various tensor operations and how would that improve efficiency of deep learning and other machine learning algorithms?

Academia and industry

The opportunity here at AWS as a principal scientist has been a very timely and an exciting opportunity. I’ve been given a lot of freedom to explore and to push ahead and to make these algorithms available on the AWS cloud for everybody to use, and we’ll be pushing ahead with many more such capabilities. And at the same time, we’re also, in a way, doing research here and asking how we can think about new algorithms, how do we benchmark them with large-scale experiments, and talk about it at various conferences and other peer-reviewed venues. So, it’s definitely a mix of research and development here that excites me, and at the same time, I continue to advise students and continue to push the research agenda. Amazon is enabling me to do that and supporting me in that, so I see this as a joint partnership. I expect this to continue. I’ll be joining Caltech as an endowed chair, and I’m looking forward to more such engagements between industry and academia.

Related resources:

A tensor renaissance in data science
Let’s build open source tensor libraries for data science
How big compute is powering the deep learning rocket ship
The Deep Learning Video Collection (Strata + Hadoop World 2016)

The O’Reilly Data Show Podcast: Anima Anandkumar on MXNet, tensor computations and deep learning, and techniques for scaling algorithms.

MXNet: An efficient, fast, and easy-to-use framework for deep learning

Tensor computations, deep learning, and hardware

Academia and industry

Share this:

Like this:

Leave a ReplyCancel reply

Discover more from Gradient Flow