This edition has 368 words which will take you about 2 minutes to read.
“There’s a Fog of War, but there’s also a Fog of Peace.” – Eric Grosse
Data Exchange podcast
- The Mathematics of Data Integration and Data Quality Ryan Wisnesky is the CTO and co-founder of Conexus, a startup that uses ideas from Category Theory and incorporates them into novel tools for data integration, data management, and knowledge management.
- Measuring Corporate and University Research Impact in AI and Machine Learning We speak with Simon Rodriguez of the Center for Security and Emerging Technology (CSET) at Georgetown University.
Free Report
We recently conducted a survey to understand how leading Healthcare, Biotech, and Pharmaceutical companies are building AI and Machine Learning products and services. The survey drew close to 400 respondents from 49 countries. Grab your report on the survey results below:
Data & Machine Learning Tools and Infrastructure
- Toward Confidential Cloud Computing This ACM paper describes the state of tools for extending “hardware-enforced cryptographic protection” to data products and services. (I recommend you read the pdf version.)
- Evidently Open source library for analyzing machine learning models during development, validation, or production monitoring.
- OpenMMLab Open source, deep learning models for computer vision from the Chinese unicorn, SenseTime.
- Apache Airflow 2020 User Survey Airflow is one of the more popular tools among data engineers and is used to build, schedule, and monitor workflows. Celery is the most popular option to execute Airflow, with Kubernetes placing second.
- Everyone with a data pipeline has data quality issues Notes from interviews with data engineering teams at mid to large-size companies.
- Data Quality at Airbnb The authors cover architecture, tools, organizational structure, and best practices. A followup post details how they design and build data pipelines.
Funding Updates
- OctoML closes $28M Series B funding round
- Enterprise AI software company Noogata raises a $12M seed round

Recommendations
- Five Common Mistakes of New Engineering Managers
- Data Science in Julia for Hackers This forthcoming book is being written in Pluto notebooks.
- Designing, Visualizing and Understanding Deep Neural Networks Complete set of videos from Sergey Levine’s course at UC Berkeley.
- Privacy-preserving tools leave private data unprotected Companies use privacy-protecting GANs (PP-GANs) to scrub images of individuals’ identity. NYU researchers recently found that PP-GANs can be subverted and can leak of sensitive information.
- Executing a distributed shuffle without a MapReduce system … in just a few lines of Python using Ray!
- Making Data Lakehouse real yet effective
Closing Short: Learn why Ikaria is one of only five Blue Zones in the world. Blue Zones have a high percentage of people who live past 90 years old.
If you enjoyed this newsletter please support our work by encouraging your friends and colleagues to subscribe: