This edition has 408 words which will take you about 2 minutes to read.
“It is always easier to destroy a complex system than to selectively alter it.” – Roby James.
Data Exchange podcast
- Improving the robustness of natural language applications In recent years adversarial attacks against computer vision models have been covered in numerous media articles. Similar attacks have surfaced for NLP models and there have been a series of research projects dedicated to generating adversarial examples and defending against these adversaries. Jack Morris is co-creator of TextAttack, an open source framework for adversarial attacks, data augmentation, and adversarial training in NLP.
- End-to-end deep learning models for speech applications Yishay Carmiel is an AI Leader at Avaya, a large company focused on digital communications. In this episode we discussed recent developments in speech technologies, specifically the rise of end-to-end speech models for ASR (automatic speech recognition).
Machine Learning tools and infrastructure
- Navigate the road to Responsible AI In this new post, I use three recent surveys and an ethnographic study to assemble a primer summarizing how companies are approaching Responsible AI today and what they aspire to do in the future.
- SDV Synthetic data generation has been in the news because of things like deepfakes and the need for training data in computer vision applications. But what about structured and semi-structured data? This new library from MIT takes single-table, multi-table and time-series datasets, and is able to generate synthetic data with the same format and statistical distribution. Developers can use this for testing the robustness of all sorts of applications – not just machine learning models, but software applications that consume data.
- The challenge of testing data pipelines
- The quest for high-quality data I’m resurfacing this 2019 overview I wrote with Ihab Ilyas because the topic of data quality has recently been popping up on my radar. There are now a few startups, as well as open source projects (Apache Griffin, Deequ) that address aspects of this super important area.
Free Virtual Conferences
- Understanding and Avoiding AI Liabilities in Practice Failures are inevitable in complex systems like AI. In this December 15th webinar, Andrew Burt, Managing Partner at BNH.ai, explains how you can mitigate liabilities arising from AI.
Work and Hiring
- Social media managers are not okay
- Data engineering job interviews are all over the place
- How to identify and use your company’s technical phone screen superforecasters
Recommendations
- Ethical Frameworks for AI Aren’t Enough
- The Billion Dollar Dataset Next Door
- The open-source differential privacy library from Google Here’s a good overview of differential privacy (DF) via a recent interview with two of the people central to open sourcing Google’s DF library.
- Active Driving Assistance Systems: Test Results and Design Recommendations A November summary from Consumer Reports.
If you enjoyed this newsletter please support our work by encouraging your friends and colleagues to subscribe: