[A version of this post appears on the O’Reilly Radar.]
The O’Reilly Data Show Podcast: Anima Anandkumar on tensor decomposition techniques for machine learning.
After sitting in on UC Irvine Professor Anima Anandkumar’s Strata + Hadoop World 2015 in San Jose presentation, I wrote a post urging the data community to build tensor decomposition libraries for data science. The feedback I’ve gotten from readers has been extremely positive. During the latest episode of the O’Reilly Data Show Podcast, I sat down with Anandkumar to talk about tensor decomposition, machine learning, and the data science program at UC Irvine.
Modeling higher-order relationships
The natural question is: why use tensors when (large) matrices can already be challenging to work with? Proponents are quick to point out that tensors can model more complex relationships. Anandkumar explains:
Tensors are higher order generalizations of matrices. While matrices are two-dimensional arrays consisting of rows and columns, tensors are now multi-dimensional arrays. … For instance, you can picture tensors as a three-dimensional cube. In fact, I have here on my desk a Rubik’s Cube, and sometimes I use it to get a better understanding when I think about tensors. … One of the biggest use of tensors is for representing higher order relationships. … If you want to only represent pair-wise relationships, say co-occurrence of every pair of words in a set of documents, then a matrix suffices. On the other hand, if you want to learn the probability of a range of triplets of words, then we need a tensor to record such relationships. These kinds of higher order relationships are not only important for text, but also, say, for social network analysis. You want to learn not only about who is immediate friends with whom, but, say, who is friends of friends of friends of someone, and so on. Tensors, as a whole, can represent much richer data structures than matrices.
Recent progress in tensor decomposition and computation
I first encountered tensors in math and physics courses, and it was only in the last few years that I’ve been hearing about them in machine learning circles. In the past, tensor computations were deemed to be too computationally expensive for most practical applications. Anandkumar points out that better hardware and recent breakthroughs have ushered applications to machine learning:
I think the first use of tensors was way back in the 1940s in a psychometrics journal. Since then, there’s been diverse work on tensors in physics, numerical analysis, signal processing, and theoretical computer science. However, in my opinion, one of the reasons I think tensors perhaps fell a little out of fashion was the fact that processing of tensors is expensive. It’s much more expensive than matrices. In fact, most of the processing complexities grow exponentially in the order of the tensor. That was one of the main reasons that when the computers were not yet very powerful, tensors could not be handled efficiently. However, I think now we are seeing a renaissance of tensors because we have an explosion in computational capabilities, and tensor operations are highly parallelizable — they can be run on the cloud. They are perfectly suited for running on GPUs since they are simple operations that can be parallelized to many cores.
… When thinking about tensors from a more theoretical computer science viewpoint, many of the tensor problems are NP-hard. That was another reason tensors were seen as exotic objects that were hard to analyze compared to matrices. That’s why people restricted to matrices to be able to prove a lot of nice properties. On the other hand, what our research and some of our collaborators and other researchers in this field have shown is that there are a lot of tensor-related problems and machine learning that are not hard. We do not encounter the worst-case hard tensors for machine learning applications. This, I would say, is the main breakthrough that makes analysis as well as manipulation of tensors tractable for many applications.
Feature learning and deep neural networks
In the course of working on analytic projects, data scientists learn to appreciate the importance of feature engineering in machine learning pipelines. The recent trend is to use techniques, like deep learning, that can automatically learn good features in applications that span many domains. Anandkumar highlights recent contributions of tensor methods in feature learning:
The latest set of results we have been looking at is the use of tensors for feature learning as a general concept. The idea of feature learning is to look at transformations of the input data that can be classified more accurately using simpler classifiers. This is now an emerging area in machine learning that has seen a lot of interest, and our latest analysis is to ask how can tensors be employed for such feature learning. What we established is you can learn recursively better features by employing tensor decompositions repeatedly, mimicking deep learning that’s being seen.
… We presented one of these early works at the deep learning workshop and got a lot of interest and useful feedback. We certainly want to unify different kinds of techniques. For instance, another application we’ve been looking at is to use these hierarchical, graphical models and also deep learning framework features that are extracted from deep learning … to have a better detection of multiple objects in the same image. Most of deep learning currently has focused on benchmark data sets where there’s mostly one image in one object, whereas we are now looking at if there are a lot of objects in an image: how can we efficiently learn this by also using the fact that objects tend to co-occur in images? Can we learn about these co-occurrences and at the same time exploit the features that are extracted from deep learning to do overall much better classification of multiple objects?
See Anima Anandkumar’s presentations from Strata + Hadoop World 2015 in San Jose in the complete video compilation.