This edition has 880 words which will take you about 5 minutes to read.
“Juggling is sometimes called the art of controlling patterns, controlling patterns in time and space.” – Ron Graham
Data Exchange podcast
- What is AI Assurance? Ofer Razon and Superwise are part of a community in the early stages of building tools and best practices for scaling AI operations. The goal is to help multiple stakeholders build the necessary solutions to evaluate models, receive alerts and troubleshoot on time, and gather insights to improve efficiency. AI assurance will ultimately bring together different parts of an organization including business, data science and operational teams, legal and compliance, and privacy and security.
- Using machine learning to detect shifts in government policy Weifeng Zhong is the core maintainer of the open source Policy Change Index (PCI), a framework that uses machine learning and NLP to “process and read” large amounts of text to discern government priorities and policies. The initial PCI uncovers policy shifts in China by text mining the People’s Daily.
[Image: National Library of China from Wikimedia]
Machine Learning tools and infrastructure
- One Simple Chart: how do open source projects interact with users
- Five key features of a machine learning platform This new seven minute video is based on a blog post I wrote with Ion Stoica. I describe elements that ML platform designers need to incorporate to meet current challenges and plan for future workloads.
- The rise of natural language interfaces to databases Photon from Salesforce joins a growing list of natural language interfaces to relational databases and/or RDF-triple stores. As I opined in previous editions of this newsletter, earlier natural language interfaces worked well in demos but struggled when faced with queries from regular users. I expect that recent advances in language models will yield tools that work much better. The idea is to cast this as a supervised learning problem and use the latest research in machine translation and neural semantic parsing. Another recent paper from Facebook – TaBERT: A new model for understanding queries over tabular data – uses similar ideas and applies it to AI assistants, fact checking and verification applications.
- Build End-to-End AI Pipelines Using Ray and Apache Spark A new post by my former co-chair, Jason Dai, CTO of Big Data Technologies and Senior Principal Engineer at Intel. The post contains a description of applications to AutoML for time series forecasting.
- google/differential-privacy At a high-level differentially private (DP) methods inject random noise at different stages of analytics or machine learning projects. This library contains a PostgreSQL extension, which means you can use it in Postgres databases. DP is starting to appear in more real-world systems. Google uses differential privacy (and other privacy-preserving techniques) in many products. There are similar tools from IBM, there’s the ODP project from Harvard, and Apple has long used differential privacy in its products.
FREE Virtual Conferences
- Learn about Multi Armed Bandits and RL-based Recommender Systems This is a tutorial I’ve long advocated for. Earlier this year I wrote a post on enterprise applications of reinforcement learning and one of the areas I highlighted was recommendation and personalization systems. This industry first tutorial, takes place at the Ray Summit and will be led by one of my favorite teachers, Paco Nathan.
- How to accelerate NLP deep learning model training across multiple GPUs New language models like BERT are very expensive to train. It cost OpenAI over $12M to train GPT-3! So instead of training from scratch, most companies focus on fine-tuning these models for their specific applications. A team at Determined AI explains how you can reduce the time it takes to fine-tune BERT from 7 hours to 30 minutes, with very minor changes to your model code. CTO Neil Conway will be speaking on this topic at the upcoming NLP Summit, the industry gathering for people interested in natural language technologies.
- Call for Speakers (Data+AI Summit Europe) closes Sep 13th. End 2020 by being part of what will be a very strong roster of speakers.
Work and Hiring
- Problem-Solving for the CS Technical Interview This was an actual class at Stanford. 📓 Also see Interview training with Google, Pinterest, and Stanford.
- Questions and tips for interviewing product managers This post from 2017 had Product Leaders at several companies send in their top interview questions and key competencies.
- Apple is looking for Machine Learning/Deep Learning infrastructure engineers A job post hints at some of the tools Apple might be using (Arrow, Bazel, Docker, Kibana, MPI, MySQL, Redis, Spark, Zookeeper).
[Image: Wildfire smoke, view from Bernal Hill in San Francisco, 2020-09-09; by Ben Lorica]
Recommendations
- An intuitive overview of recent advances in automated reading comprehension: David Talby recently wrote two good posts that cover rapidly changing and widely applicable research areas in NLP. Part I. Recent progress in automated Question Answering about facts in Wikipedia articles ; Part II. Progress in automated conversational Question Answering, with natural sounding answers in the context of the flow of conversation.
- Inside the Hidden World of Legacy IT Systems We need more articles about what I call dark IT – legacy infrastructure and aging software that power wastewater treatment plants, power grids, air traffic control, communications services, and many vital government systems.
- Build Your AI Incident Response Plan… Before It’s Too Late How to combine model risk management practices (found in financial services) with existing computer incident response guidance and other information security best practices.
- Traffic prediction with advanced Graph Neural Networks If these results hold – improved accuracy of real time ETAs by up to 50% – this new model developed by DeepMind and the Google Maps team will make many drivers and commuters very happy.