This edition has 800 words which will take you about 4 minutes to read.
“You may not get rich by using all the available information, but you surely will become poor if you don’t.” – Jack Treynor
Data Exchange podcast
- Best practices for building conversational AI applications Alan Nichol is co-founder and CTO of Rasa, the startup behind the popular open source framework for building conversational AI applications. We talked about the state of developer tools, as well as software engineering best practices for building chatbots and related applications.
- Tools for scaling machine learning Our special correspondent Jenn Webb organized a mini-panel composed of myself and Paco Nathan, author, teacher, and founder of Derwen.ai, a boutique consulting firm specializing in Data, machine learning (ML), and AI.
[Image: Danish Architecture – Aarhus by Alex Berger]
Machine Learning tools and infrastructure
- One Simple Chart: which industry sectors are using reinforcement learning
- Compression of Deep Learning Models for Text: A Survey A few months ago a friend wowed me with the speech recognition model on his Google Pixel phone. It was amazingly accurate, its latency was incredibly low, and it was delivering this in the middle of a loud SF coffee shop. As much as we laud recent progress in language models, deploying gigantic models is challenging even for high-end servers. This new paper from MSR India surveys state-of-the-art model compression strategies for NLP models.
- Delta Lake: High-Performance ACID Table Storage over Cloud Object Stores A VLDB paper that describes some of the core technological challenges and solutions that set the lakehouse apart from the data lake. The lakehouse pattern was first described in a blog post earlier this year.
- Practical Data Ethics This new course from Fast.ai. was originally taught in-person at the University of San Francisco Data Institute at the start of 2020.
- 5 Levels of Conversational AI A few years ago I asked the founders of Rasa to explain conversational assistants to a broad audience by devising a framework similar to the “levels of driver assistance technology” used by the NHTSA and the car industry. The original post was widely read within chatbot circles and Alan Nichol just updated it to reflect current technologies and the market as it stands today.
FREE Virtual Conferences
- Trends in AI and Python Scalability Dean Wampler and I do a deep dive into two trends that influenced how we put together the Ray Summit program. First, there’s been steady progress towards simplification, efficiency, and lower costs: think of cloud computing, microservices, serverless, and cloud native infrastructure. Second, we were also cognizant of the growing importance of machine learning, particularly of deep learning and reinforcement learning, techniques that bring a host of challenges for developers.
- NLP: the most important field in machine learning A short presentation by Clément Delangue, CEO of Hugging Face, whose open-source framework (Transformers) has been downloaded millions of times. Clément will be giving a keynote at the upcoming NLP Summit.
- Bias in AI Workshop This interesting panel discussion was part of a recent workshop organized by the National Institute of Standards and Technology (which is part of the US Department of Commerce).
Work and Hiring
- How To Create a Software Engineer Resume Hiring Managers Will Love
- US job market is pretty challenging, particularly in tech Through July 24th, the number of data science job openings is down 37% in the major tech hubs, and 51% in other metropolitan areas.
- Data Teams Survey Our friends at the Big Data Institute have a short survey out that we encourage you to fill out.
[Image: Exhibit in the Ethnological Museum from Wikimedia]
- Code for The Economist’s election forecasting model The US elections are 68 days away, so more data scientists involved in election models are updating their predictions, and some are even publishing their source code. For a nuanced comparison of predictions from The Economist and FiveThirtyEight, see Andrew Gelman’s recent post.
- ETL for voter rolls Data pipelines are everywhere and most organizations are just beginning to put tools and best practices in place to tame them. If a data pipeline breaks down or throws off errors, applications and systems that depend on it are heavily impacted. This article profiles citizens who scrutinize state voter files to help minimize errors during the cleanup process. Using simple data query techniques they have identified thousands of voters mistakenly purged from voter lists. More resources should be allocated to lowering the error rates of voter roll cleanup tools used by state governments. Alas, more attention is placed on electoral fraud which is negligible in the US (see our recent video).
- How an AI grading system ignited a national controversy in the U.K.
- China hires over 100 TSMC engineers in push for chip leadership If Chinese companies are forbidden from buying critical semiconductors, they can and will build their own local ecosystem.
- Asynchronous Reinforcement Learning A team from Intel Labs and USC introduce an efficient high-throughput reinforcement learning architecture for training agents on a single machine. The goal of this project is to democratize deep RL and make it possible to train “whole populations of agents on billions of environment transitions using widely available commodity hardware”.
- Kintsugi: Japanese art of repairing broken pottery A beautiful short video from the BBC.