This edition has 716 words which will take you about 4 minutes to read.
“Hope is the thing with feathers…” – Emily Dickinson
Data Exchange podcast
- Improving performance and scalability of data science libraries Wes McKinney created the pandas project in 2008 and over time it has become one of the most popular tools in data science. In this episode we discussed his approach to growing an open source project, and his current focus on sustaining the development of Apache Arrow.
- Understanding machine learning model governance As machine learning becomes widely deployed, organizations will need to develop processes and tools to ensure that models behave as intended. Harish Doddi describes how companies can work towards having the right set of controls and validation steps in place.
- Human-edited Transcripts We are happy to announce that we are beginning to produce high-quality transcripts for some episodes. Our transcripts are PDF files that are free to download. The growing collection of transcripts can be found here.
Machine Learning tools and infrastructure
- A highly efficient, real-time text-to-speech system deployed on CPUs Human-level audio quality, without the need for GPUs or other specialized hardware. This reminds me of a conversation I had with Nir Shavit (of MIT and Neural Magic), in which he explained why he believes that software and commodity hardware will prove capable of handling most machine learning tasks.
- Practical Applications of Homomorphic Encryption (HE) HE is one of the more promising techniques for privacy-preserving analytics and machine learning. It is a new form of encryption that allows one to compute on encrypted data, without having to decrypt first. But as Hao Chen points out in his recent survey talk, this short list of benchmarks indicates that HE by itself isn’t ready for complex real-time models or for big data. Chen shows HE works best when combined with other techniques.
- One simple chart: Demand for Machine Learning Engineers
- Splunk’s Data Stream Processor now uses Pulsar From the release notes: “DSP now uses Apache Pulsar as its messaging bus”. Splunk joins a growing number of companies using Pulsar, including Yahoo, Verizon, Comcast and more.
Virtual Conferences
- Presto, virtual book tour Our friend and colleague Paco Nathan is hosting a free online panel with the authors of “Presto: The Definitive Guide”. Presto is a very popular open source, distributed SQL engine that came out of Facebook. This takes place on May 27th, register here.
- The road to AutoML Hear from experts building solutions for AutoML’s key building blocks – hyperparameter tuning and neural architecture search. This free virtual event takes place June 10th, register here.
- Nate Silver at the Spark+AI Summit With the US Presidential elections taking place later this year, we are happy to have Nate Silver, the leading election forecaster in the US. After his technical talk on data analysis in the age of big data, and on tips for how to convey probabilities and uncertainty to the general public, Nate will answer questions on the nuts and bolts of building complex election forecasting models. This is a FREE event.
Work and Hiring
- My First Year as a Freelance AI Engineer A really good overview peppered with practical advice useful to all aspiring freelancers (not just engineers).
- SQL Interview Questions Originally written for people interviewing for data analyst or data scientist positions, this is also a handy guide for ETL/data engineers.
- The reason Zoom calls drain your energy. I’m hopeful that over time, companies and managers will limit video calls, or end them earlier to give people breaks in between. We all seem to accept that virtual conferences have to be shorter in duration compared to live events, why are video calls any different?
Recommendations
- The Shrink next door A gripping podcast serial from Bloomberg’s Joe Nocera, who for many years wrote a business column for the NYTimes.
- You are messing with magic The advertising industry assumes that ad companies always benefit from more data and fancier models, this essay highlights that selection effects are frequently stronger than the advertising effect alone.
- Flash Crash My first job post-academia was as a lead quant in a small hedge fund. I’ve since followed the industry from afar and I have read my share of books about traders and trading. This newly released book is going to be classic, and the advanced endorsements that prompted me to read it are well deserved.
- Michael Franti’s virtual concerts This popular SF Bay Area musician decided to ride out the initial part of the pandemic in Bali, Indonesia.
[Image: Newsletter from Pixabay.]