“Done is better than perfect.” – Sheryl Sandberg
Data Exchange podcast
- The State of Data Journalism A conversation with Tara Kelly, Data Editor at DataJournalism.com (DJC) an organization created by the European Journalism Centre. DJC provides journalists and media groups with free resources, materials, online video courses and community forums.
- Why Graph Databases and Graph Analytics are hot again Our friend Paco Nathan has been doing a lot of work with graphs and as such he’s had to immerse himself in the world of graph data management technologies. This conversation is focused on what’s new with graph databases, use cases of graph databases, graph analytics, and graph neural networks.
FREE Virtual conference
I’m once again the co-chair of the NLP Summit and we have another outstanding lineup for you this year. We have speakers from leading organizations including Hugging Face, Stanford NLP and Stanza, Spark NLP, the NSF, Microsoft Research, Eleuther, and AI21 Labs – creators of the largest model ever made available to developers.
Data & Machine Learning tools and infrastructure
- Data Validation Tool In our soon to be released Data Engineering Survey, respondents cited Data Quality and Data Validation as one of the key challenges facing their data teams. This newly open sourced library from Google is a Python tool that provides an automated and repeatable solution for data validation across different environments.
- The Data Lakehouse :: FAQ A data management paradigm that we first introduced last year is quietly and steadily gaining traction.
- Darts An open source Python library for easy manipulation and forecasting of time series. Among other things, Darts lowers the barrier for using deep learning models for forecasting and allows you to train on multiple (thousands or more) of possibly multi-dimensional time series.
- Whale: Scaling Deep Learning Model Training to the Trillions
- River An open source Python library for online machine learning.

Recommendations
- Why achieving true progress in Natural language Understanding will be harder than we think Yoav Goldberg’s recent must-read thread on language models and NLU.
- Program Synthesis with Large Language Models This new paper from Google Research investigates whether large language models can be used to synthesize code in a general-purpose language → “it is worth emphasizing that we are a long way from models that can synthesize complex applications without human supervision”.
- An attempt at demystifying graph deep learning
- Prettymaps Python library for drawing gorgeous customized maps from OpenStreetMap data.
- Machine Learning – A First Course for Engineers and Scientists FREE preliminary version of an upcoming book based on a course at Uppsala University.
- The Complete FAANG Preparation A study guide for software engineering job interviews with US big tech companies.
Closing short: Solo Band.
If you enjoyed this newsletter please support our work by encouraging your friends and colleagues to subscribe: