Issue #9: Scalability, Privacy, and AutoML

person touching open macbook on table

Subscribe Previous Issues

This edition has 716 words which will take you about 4 minutes to read.

“Hope is the thing with feathers…” – Emily Dickinson

Data Exchange podcast

  • Improving performance and scalability of data science libraries  Wes McKinney created the pandas project in 2008 and over time it has become one of the most popular tools in data science. In this episode we discussed his approach to growing an open source project, and his current focus on sustaining the development of Apache Arrow. 
  • Understanding machine learning model governance  As machine learning becomes widely deployed, organizations will need to develop processes and tools to ensure that models behave as intended. Harish Doddi describes how companies can work towards having the right set of controls and validation steps in place.  
  • Human-edited Transcripts  We are happy to announce that we are beginning to produce high-quality transcripts for some episodes. Our transcripts are PDF files that are free to download. The growing collection of transcripts can be found here.

Machine Learning tools and infrastructure

Virtual Conferences

  • Presto, virtual book tour  Our friend and colleague Paco Nathan is hosting a free online panel with the authors of “Presto: The Definitive Guide”.  Presto is a very popular open source, distributed SQL engine that came out of Facebook. This takes place on May 27th, register here.
  • The road to AutoML  Hear from experts building solutions for AutoML’s key building blocks – hyperparameter tuning and neural architecture search. This free virtual event takes place June 10th, register here.
  • Nate Silver at the Spark+AI Summit   With the US Presidential elections taking place later this year, we are happy to have Nate Silver, the leading election forecaster in the US. After his technical talk on data analysis in the age of big data, and on tips for how to convey probabilities and uncertainty to the general public, Nate will answer questions on the nuts and bolts of building complex election forecasting models. This is a FREE event.

Work and Hiring

  • My First Year as a Freelance AI Engineer  A really good overview peppered with practical advice useful to all aspiring freelancers (not just engineers).
  • SQL Interview Questions  Originally written for people interviewing for data analyst or data scientist positions, this is also a handy guide for ETL/data engineers.   
  • The reason Zoom calls drain your energy.  I’m hopeful that over time, companies and managers will limit video calls, or end them earlier to give people breaks in between. We all seem to accept that virtual conferences have to be shorter in duration compared to live events, why are video calls any different?


  • The Shrink next door  A gripping podcast serial from Bloomberg’s Joe Nocera, who for many years wrote a business column for the NYTimes.
  • You are messing with magic   The advertising industry assumes that ad companies always benefit from more data and fancier models, this essay highlights that selection effects are frequently stronger than the advertising effect alone.
  • Flash Crash  My first job post-academia was as a lead quant in a small hedge fund. I’ve since followed the industry from afar and I have read my share of books about traders and trading. This newly released book is going to be classic, and the advanced endorsements that prompted me to read it are well deserved.
  • Michael Franti’s virtual concerts  This popular SF Bay Area musician decided to ride out the initial part of the pandemic in Bali, Indonesia.   

[Image: Newsletter from Pixabay.]

%d bloggers like this: