This edition has 485 words which will take you about 3 minutes to read.
“Moses was technically the first person to download files to his tablet from the cloud.” – @ADDiane
Data Exchange podcast
- Automation in Data Management and Data Labeling Hyun Kim is co-founder and CEO of Superb AI, a startup building tools to help companies manage data across the entire machine learning application lifecycle. This includes tools to label, store, and monitor data assets that power computer vision applications. We also discussed emerging trends in machine learning and AI including synthetic data, reinforcement learning, and self-supervised learning.
- Questioning the Efficacy of Neural Recommendation Systems I speak with Paolo Cremonesi, Professor of Computer Science, and Maurizio Ferrari Dacrema a postdoc at Politecnico di Milano, where they are both part of the RecSys research group. We discussed two survey papers they recently completed on the use of deep learning in recommendation systems, as well as broader trends in RecSys systems.

Data & Machine Learning tools and infrastructure
- From Cloud Computing to Sky Computing This new vision paper from Ion Stoica and Scott Shenker describes a more commoditized version of cloud computing. Also see our related post on “The Emergence of Multi-cloud Native Applications and Platforms”
-
Apache Pulsar gains a major ally DataStax launches Astra Streaming, an event streaming and messaging platform built on top of Pulsar
- Latest release notes for John Snow Labs NLU This is an open source library aimed at developers interested in using state-of-the-art text mining directly on any dataframe, with a single line of Python code.
- The case for XGBoost for tabular data In a recent edition of this newsletter, I hinted that I was getting encouraging results from early experiments with the open source version of TabNet, a deep neural model for tabular data. But let’s not toss out other modeling methods just yet! This new paper from Intel notes that, at least in their experiments, XGBoost outperformed TabNet and other neural models. So which technique are users suppose to use? The right strategy is to experiment and combine models as you see fit, based on the specifics of your dataset and application requirements.
- Ray Distributed Library Patterns A great article on what it means to be “integrated with Ray”, and how to build libraries on top of Ray.
2021 NLP Survey
The 2021 NLP Industry Survey is now open and we need your help. The survey takes only about 5 minutes to fill out and in exchange we’ll send you a copy of the survey results + a FREE pass to the 2021 NLP Summit (a virtual conference happening in October).
Recommendations
- The Top Trends in Tech A must-read new report and interactive website from the McKinsey Technology Council
- Self-supervision from the bottom up Alyosha Efros explains why he’s excited about the potential of self-supervised learning methods in computer vision.
- Neural Network Verification Given challenges facing ML teams in areas like safety, robustness, consistency, this is a must-read (in progress) book on how to apply ideas from formal methods to verifying properties of neural networks.
- The SaaS CTO Security Checklist A nice list, but as with anything pertaining to managing risk, we need a section dedicated to organizational best practices.
- We Are AI A five-module NYU course on the basics of AI that comes with comic books
Closing Short: Big Tech in China.
If you enjoyed this newsletter please support our work by encouraging your friends and colleagues to subscribe: