This edition has 450 words which will take you about 2 minutes to read.
“There’s always a way if you’re not in a hurry.” – Paul Theroux
Data Exchange podcast
- Towards a next-generation data orchestrator Chris White is the CTO of Prefect, a startup building tools to help companies build, monitor, and manage dataflows. Prefect originated from lessons Chris and his co-founder learned while they were at Capital One, where they were early users and contributors to related projects like Apache Airflow.
- Building a flexible, intuitive, and fast forecasting library Reza Hosseini and Albert Chen of Linkedin, are part of the team behind one my favorite new open source tools: Greykite, a flexible and fast library for time-series forecasting.

Data & Machine Learning tools and infrastructure
- The Road to Intelligent Process Automation We examine the state of process automation technologies in the Fortune 1000 and in key technology hubs in the US.
- BytePlus According to the FT, this new division of ByteDance is selling the technology that powers its viral video app TikTok to websites and apps outside China. BytePlus has several SaaS offerings including recommendation models and tools for testing new data products and services. Given the rather frosty relationship between China and the West, BytePlus faces an uphill battle in Europe, the Five Eyes, and their allies.
- Julia: Fast as Fortran, Beautiful as Python
- EdgeQL A new, strictly typed query language that aims to surpass SQL for graph applications (the parent project EdgeDBstores and describes data as strongly typed objects and relationships between them). It is functional in nature and designed to be composable and easy to learn.
- IBM open sources CodeFlare Built on top of Ray, CodeFlare simplifies the integration and scaling of analytic and machine learning workflows in hybrid clouds.
- The Geography of Open Source Software A team of economists measure open source software contribution from 2010-2020 at a national, regional, and local level using data from GitHub and adjacent platforms. The overall share of active developers has become more evenly distributed between countries, but in a nod to the importance of technology hubs, within-country regional differences persist. They hope to include GitLab and Bitbucket in future versions.
2021 Data Engineering Survey
Tell us which data tools you are most likely to adopt in the next 12-24 months—and what criteria your DataOps team uses to evaluate them. The survey takes about 5 minutes to fill out and we’ll share the report of the survey findings with you. You’ll also be entered in a drawing for free copies of Jesse Anderson’s Data Teams book and other prizes.
Recommendations
- How useful was the Netflix Prize challenge for Netflix?
- Goomics Fabulous series of comics about life at Google (2010-2021) from former Google engineer Manu Cornet (book version).
- The Document-based Meeting Culture of Amazon Depending on the length of the document for the meeting, attendees start by reading anywhere from ten minutes to half an hour.
- Write a time-series database engine from scratch A recent tutorial that highlights why time-series are ideal for learning about storage engines.
- Bessemer’s Data Infrastructure roadmap This interesting guide to increasingly hot areas in data engineering, showcases some of Bessemer’s portfolio companies.
- How to build a data team A short story describing the path to transforming an organization to be truly data-native.
Closing Short: When the media is a few steps ahead of the story.
If you enjoyed this newsletter please support our work by encouraging your friends and colleagues to subscribe: