Why AI and machine learning researchers are beginning to embrace PyTorch

[A version of this post appears on the O’Reilly Radar.]

The O’Reilly Data Show Podcast: Soumith Chintala on building a worthy successor to Torch and deep learning within Facebook.

Subscribe to the O’Reilly Data Show Podcast to explore the opportunities and techniques driving big data, data science, and AI. Find us on Stitcher, TuneIn, iTunes, SoundCloud, RSS.

In this episode of the Data Show, I spoke with Soumith Chintala, AI research engineer at Facebook. Among his many research projects, Chintala was part of the team behind DCGAN (Deep Convolutional Generative Adversarial Networks), a widely cited paper that introduced a set of neural network architectures for unsupervised learning. Our conversation centered around PyTorch, the successor to the popular Torch scientific computing framework. PyTorch is a relatively new deep learning framework that is fast becoming popular among researchers. Like Chainer, PyTorch supports dynamic computation graphs, a feature that makes it attractive to researchers and engineers who work with text and time-series.

Here are some highlights from our conversation:

The origins of PyTorch

TensorFlow addressed one part of the problem, which is quality control and packaging. It offered a Theano style programming model, so it was a very low-level deep learning framework. … There are a multitude of front ends that are trying to cope with the fact that TensorFlow is a very low-level framework—there’s TF-slim, there’s Keras. I think there’s like 10 or 15, and just from Google there’s probably like four or five of those.
Continue reading “Why AI and machine learning researchers are beginning to embrace PyTorch”

Building a next-generation platform for deep learning

[A version of this post appears on the O’Reilly Radar.]

The O’Reilly Data Show Podcast: Naveen Rao on emerging hardware and software infrastructure for AI.

Subscribe to the O’Reilly Data Show Podcast to explore the opportunities and techniques driving big data, data science, and AI. Find us on Stitcher, TuneIn, iTunes, SoundCloud, RSS.

In this episode of the Data Show, I speak with Naveen Rao, VP and GM of the Artificial Intelligence Products Group at Intel. In an earlier episode, we learned that scaling current deep learning models requires innovations in both software and hardware. Through his startup Nervana (since acquired by Intel), Rao has been at the forefront of building a next generation platform for deep learning and AI.

I wanted to get his thoughts on what the future infrastructure for machine learning would look like. At least for now, we’re seeing a variety of approaches, and many companies are using heterogeneous processors (even specialized ones) and proprietary interconnects for deep learning. Nvidia and Intel Nervana are set to release processors that excel at both training and inference, but as Rao pointed out, at large-scale there are many considerations—including utilization, power consumption, and convenience—that come into play.

Here is a partial list of the items we discussed:

  • Deep learning in comparison to other machine learning algorithms
  • Key features and the current status of Intel Nervana’s Lake Cresttechnology
  • Deep learning frameworks and related software tools including Nervana Graph.
  • Building next-generation hardware and software components for deep learning
  • An overview of the major AI initiatives within Intel (including the establishment of a new AI Research Lab that Rao is leading)

Related resources:

Data science and deep learning in retail

[A version of this post appears on the O’Reilly Radar.]

The O’Reilly Data Show Podcast: Jeremy Stanley on hiring and leading machine learning engineers to build world-class data products.

Subscribe to the O’Reilly Data Show Podcast to explore the opportunities and techniques driving big data, data science, and AI. Find us on Stitcher, TuneIn, iTunes, SoundCloud, RSS.

In this episode of the Data Show, I spoke with Jeremy Stanley, VP of data science at Instacart, a popular grocery delivery service that is expanding rapidly. As Stanley describes it, Instacart operates a four-sided marketplace comprised of retail stores, products within the stores, shoppers assigned to the stores, and customers who order from Instacart. The objective is to get fresh groceries from popular retailers delivered to customers in a timely fashion. Instacart’s goals land them in the center of the many opportunities and challenges involved in building high-impact data products.

Retail produces some of the most interesting case studies involving the use of big data and machine learning. This observation holds true for companies worldwide: I’m seeing data products in retail in the U.S. and Europe, and some of the most exciting developments are happening in Asia. We covered the intersection of retail and logistics at a recent Strata Data conference, where we showcased the use of data and machine learning in transportation and logistics.

Here are some highlights from my conversation with Jeremy Stanley:
Continue reading “Data science and deep learning in retail”

Scaling machine learning

[A version of this post appears on the O’Reilly Radar.]

The O’Reilly Data Show Podcast: Reza Zadeh on deep learning, hardware/software interfaces, and why computer vision is so exciting.

Subscribe to the O’Reilly Data Show Podcast to explore the opportunities and techniques driving big data, data science, and AI. Find us on Stitcher, TuneIn, iTunes, SoundCloud, RSS.

In this episode of the Data Show, I spoke with Reza Zadeh, adjunct professor at Stanford University, co-organizer of ScaledML, and co-founder of Matroid, a startup focused on commercial applications of deep learning and computer vision. Zadeh also is the co-author of the forthcoming book TensorFlow for Deep Learning (now in early release). Our conversation took place on the eve of the recent ScaledML conference, and much of our conversation was focused on practical and real-world strategies for scaling machine learning. In particular, we spoke about the rise of deep learning, hardware/software interfaces for machine learning, and the many commercial applications of computer vision.

Prior to starting Matroid, Zadeh was immersed in the Apache Spark community as a core member of the MLlib team. As such, he has firsthand experience trying to scale algorithms from within the big data ecosystem. Most recently, he’s been building computer vision applications with TensorFlow and other tools. While most of the open source big data tools of the past decade were written in JVM languages, many emerging AI tools and applications are not. Having spent time in both the big data and AI communities, I was interested to hear Zadeh’s take on the topic.

Here are some highlights from our conversation:
Continue reading “Scaling machine learning”

Deep learning that’s easy to implement and easy to scale

[A version of this post appears on the O’Reilly Radar.]

The O’Reilly Data Show Podcast: Anima Anandkumar on MXNet, tensor computations and deep learning, and techniques for scaling algorithms.

Subscribe to the O’Reilly Data Show Podcast to explore the opportunities and techniques driving big data, data science, and AI. Find us on Stitcher, TuneIn, iTunes, SoundCloud, RSS.

In this episode of the Data Show, I spoke with Anima Anandkumar, a leading machine learning researcher, and currently a principal research scientist at Amazon. I took the opportunity to get an update on the latest developments on the use of tensors in machine learning. Most of our conversation centered around MXNet—an open source, efficient, scalable deep learning framework. I’ve been a fan of MXNet dating back to when it was a research project out of CMU and UW, and I wanted to hear Anandkumar’s perspective on its recent progress as a framework for enterprises and practicing data scientists.

Here are some highlights from our conversation:
Continue reading “Deep learning that’s easy to implement and easy to scale”

Deep learning for Apache Spark

[A version of this post appears on the O’Reilly Radar.]

The O’Reilly Data Show Podcast: Jason Dai on BigDL, a library for deep learning on existing data frameworks.

Subscribe to the O’Reilly Data Show Podcast to explore the opportunities and techniques driving big data, data science, and AI. Find us on Stitcher, TuneIn, iTunes, SoundCloud, RSS.

In this episode of the Data Show, I spoke with Jason Dai, CTO of big data technologies at Intel, and co-chair of Strata + Hadoop World Beijing. Dai and his team are prolific and longstanding contributors to the Apache Spark project. Their early contributions to Spark tended to be on the systems side and included Netty-based shuffle, a fair-scheduler, and the “yarn-client” mode. Recently, they have been contributing tools for advanced analytics. In partnership with major cloud providers in China, they’ve written implementations of algorithmic building blocks and machine learning models that let Apache Spark users scale to extremely high-dimensional models and large data sets. They achieve scalability by taking advantage of things like data sparsity and Intel’s MKL software. Along the way, they’ve gained valuable experience and insight into how companies deploy machine learning models in real-world applications.

When I predicted that 2017 would be the year when the big data and data science communities start exploring techniques like deep learning in earnest, I was relying on conversations with many members of those communities. I also knew that Dai and his team were at work on a distributed deep learning library for Apache Spark. This evolution from basic infrastructure, to machine learning applications, and now applications backed by deep learning models is to be expected.

Once you have a platform and a team that can deploy machine learning models, it’s natural to begin exploring deep learning. As I’ve highlighted in recent episodes of this podcast (here and here), companies are beginning to apply deep learning to time-series data, event data, text, and images. Many of these same companies have already invested in big data technologies (many of which are open source) and employ data scientists and data engineers who are comfortable with these tools.
Continue reading “Deep learning for Apache Spark”

The key to building deep learning solutions for large enterprises

[A version of this post appears on the O’Reilly Radar.]

The O’Reilly Data Show Podcast: Adam Gibson on the importance of ROI, integration, and the JVM.

Subscribe to the O’Reilly Data Show Podcast to explore the opportunities and techniques driving big data, data science, and AI. Find us on Stitcher, TuneIn, iTunes, SoundCloud, RSS.

As data scientists add deep learning to their arsenals, they need tools that integrate with existing platforms and frameworks. This is particularly important for those who work in large enterprises. In this episode of the Data Show, I spoke with Adam Gibson, co-founder and CTO of Skymind, and co-creator of Deeplearning4J (DL4J). Gibson has spent the last few years developing the DL4J library and community, while simultaneously building deep learning solutions and products for large enterprises.

Here are some highlights:

Continue reading “The key to building deep learning solutions for large enterprises”