This edition has 560 words which will take you about 3 minutes to read.
“We know that the ‘superspreaders’ and ‘superconsumers’ of fake news, who drive the concentration in these samples, are mostly bots. – Sinan Aral
Data Exchange podcast
- Building and deploying knowledge graphs Mayank Kejriwal is one of the leading experts on knowledge graphs. We discussed the critical role knowledge graphs play in modern AI applications, and tools developers can use to get started.
- Making Deep Learning more Accessible I caught up with Piero Molino, creator of Ludwig, a toolbox that allows users to train and test deep learning models through a declarative interface. Piero created Ludwig while serving as a Senior Research Scientist at Uber AI, but it quickly turned into a tool also used by developers and practitioners.
Machine Learning tools and infrastructure
- Towards ML Engineering: A Brief History Of TensorFlow Extended A new survey of machine learning efforts within Google points out something that teams within many tech companies have also concluded: “end-to-end ML platforms, which help with all aspects of the ML lifecycle, are usually needed to both accelerate ML adoption and make its use durable and sustainable”.
- Why Does No One Use Advanced Hyperparameter Tuning? This great post by Liam Li should help you add state-of-the-art HP tuning to your machine learning toolkit.
- Measuring Gendered Correlations in Pre-trained NLP Models This new study from Google AI comes with a set of best practices that can help teams build robust applications and bring responsible AI tools to NLP. The results have real-world applications as racial and gender bias have been documented in resume-job matching software, image captioning, machine translation, sentiment analysis, and other settings.
- spaCy adds parallel and distributed training with Ray This is just one of many announced enhancements in the upcoming version 3.0 release of spaCy, one of the most widely used NLP libraries.
Virtual Conferences
- Query Optimization at Snowflake Software engineer, Jiaqi Yan, describes Snowflake’s optimizations for analytic queries at this recent presentation given at Carnegie Mellon University.
- The Ethical Algorithm A good Simons Institute survey talk by Michael Kearns who co-wrote a very good book with the same title. Besides being a leading ML researcher, he has extensive industry experience (particularly in finance) to draw on.
- The first NLP Summit program was outstanding and you can still watch talks for free, on-demand.
Work and Hiring
- Analysis of 5,500 Data Science Jobs
- Learning path and resources to become a data engineer
- Embracing transitions
Recommendations
- Split-second phantom attacks against autonomous driving systems Researchers embedded phantom road signs into an advertisement presented on a digital billboard and fooled recent software systems from Tesla and Mobileye. A split-second phantom attack is a phantom (e.g., a Stop sign) that appears for a few milliseconds.
- Transparency and reproducibility in artificial intelligence A group of researchers penned an open letter expressing concern about the lack of reproducibility in AI research. They were prompted by a Google Health study that presented a model that supposedly outperformed human radiologists in a breast cancer screening task. Ironically, the letter by the concerned scientists is behind a paywall! Here are a couple of articles summarizing its contents ([1], [2]).
- Agents of Chaos This two-part documentary series delves into Russian interference in the 2016 U.S. election.
- New Yorker profile of Moxie Marlinspike, the founder of the end-to-end encrypted messaging service Signal. I’ve been an avid user of Signal for years and I’ve used it all over the world including in China. If I can get my friends and family overseas to ditch whatsapp I would be very happy!
If you enjoyed this newsletter please support our work by encouraging your friends and colleagues to subscribe: