This edition has 482 words which will take you about 3 minutes to read.
“They beat me up unjustly, but since they did the same thing to everyone else, it was not unfair.” – Sydney Morgenbesser
Data Exchange podcast
- Securing machine learning applications Ram Shankar is a Berkman Klein Center affiliate, and a researcher and engineer who works at the intersection of Machine Learning and Security. This episode is focused on the current state of tools and techniques for building secure and trustworthy machine learning applications.
- Testing Natural Language Models NLP model building tends to follow the following sequence: split your data into train-validation data sets; build a model using your training subset; and test its efficacy using your validation set. Marco Ribeiro, a Senior Researcher at Microsoft Research, describes how ideas from software engineering can be used to inject more rigor into the NLP model development process.
[Image: Art Deco Plane Clock by Dean Wampler, used with permission]
Machine Learning tools and infrastructure
- Hyperparameter Search with Hugging Face Transformers This recent post describes a simple yet powerful integration between the 3.1 release of Hugging Face Transformers and Ray Tune. [Hugging Face + Tune] should be a popular combination for NLP users needing to fine-tune language models.
- Adapting on the Fly to Test Time Distribution Shift A new post from Berkeley AI introduces machine learning tools that assume that the distribution of training data does not exactly match what you’ll see once you deploy a model. This is a welcome development as concept drift is a challenge facing all predictive models.
- The Deep Learning Tool We Wish We Had In Grad School
- Time-series databases, two updates: (1) TimescaleDB is now a distributed, multi-node, petabyte-scale relational database. (2) InfluxDB IOx is a forthcoming optional storage backend (in-memory columnar store built with Rust and Arrow) for InfluxDB.
Free Virtual Conferences
- Securing AI and machine learning applications In this December 15th webinar, Yishay Carmiel will explain how new tools in AI Security impact machine learning development and deployment.
Work and Hiring
- Job interest is not the best predictor for job satisfaction A meta-analysis found that you should weigh other factors including “the organization you work for, your supervisor, colleagues and pay”.
- You’re never too old to be a founder A profile of Snowflake and its founders.
- Big data engineer roadmap A set of graphics that list essential tools and topics for a variety of roles.
[Baháʼí gardens in Haifa, Israel. Image by Ben Lorica]
Recommendations
- Fentanyl, Inc. A terrifying and important book on the world of synthetic drugs. Fentanyl abuse is a serious problem in San Francisco, and according to this book, it’s also at epidemic rates in other parts of the US.
- The Mike Speiser Incubation Playbook This article explains the unique investment strategy of the VC who was the founding investor of Snowflake, Pure Storage, and many other successful startups.
- Diagnosing Gender Bias in Image Recognition Systems Algorithmic bias remains a problem in computer vision. This study uncovered gender bias in popular tools including Google Cloud Vision, Microsoft Azure Computer Vision and Amazon Rekognition.
- SQL analytics and the evolution of the lakehouse.
If you enjoyed this newsletter please support our work by encouraging your friends and colleagues to subscribe: