Issue #12: Augmentation, Adversarial ML, the Politics Industry

This edition has 820 words which will take you about 5 minutes to read.

“The most obvious, ubiquitous, important realities are the ones that are hardest to see and talk about.” – David Foster Wallace

Data Exchange podcast

Designing machine learning models for both consumer and industrial applications Christopher Nguyen, CEO of Arimo (a Panasonic company), on the benefits of combining domain knowledge and rule-based systems, with machine learning models.
Machines for unlocking the deluge of COVID-19 papers, articles, and conversations Primer’s Amy Heineike on building machine learning systems that can read and write.
High-quality transcripts In case you haven’t downloaded one yet, we continue to add to the list of episodes that have accompanying free transcripts.

[Image: Industrial Hall from Pikist]

Using Neural Networks to Find Answers in Tables Researchers from Google and Tel Aviv University introduce a question answering system that reasons over tables. Given some tabular data, their system is able to answer questions about that data. This is reminiscent of systems for querying databases using a natural language interface. Based on my experience those previous systems only seemed to work well during product demos. The advent of transformers and large neural networks for NLP could finally yield a practical natural language interface for structured data.
Adversarial Machine Learning, industry perspectives This paper details how 28 organizations secure their ML assets.
Pulsar for Kafka people A great overview from the recent Pulsar Summit, Pulsar is deservedly attracting the attention of companies across many different industries.
Map Of Computing Architectures for AWS I’ve used this handy visual tool – a hexagon – to understand the scope of AWS, now at 258 products and counting.
Large image datasets: A pyrrhic win for computer vision? This new paper attempts to draw the attention of the machine learning community towards the ethical implications of large-scale datasets used for training and benchmarking models. One such dataset, the 80 million Tiny Images dataset, has never been audited closely. The authors found labels that contained ethnic or racial slurs, leading MIT to quickly withdraw Tiny Images.

Demystifying Neural Architecture Search (NAS) in Theory and in Practice Liam Li explains recent progress in NAS, a set of techniques to automate the design of neural networks. Some promising NAS tools are now available on the open source Determined Training Platform.
The Uncanny Valley of Virtual Conferences A great post by Ben Recht on re-thinking (academic) conferences in light of the current pandemic. Ben helped me organize Hardcore Data Science which ran in the early years of my stint as Strata’s program chair.
Keynotes from the Spark+AI Summit I live-tweeted some of the keynotes at the recent conference and collected several of my Twitter threads into a post. As I touted in a previous newsletter, Hany Farid (“father of digital forensics”) gave an outstanding presentation on deepfakes.
Discovering Symbolic Models from Deep Learning with Inductive Biases A video explaining a recent paper that uses graph neural networks to fit a data set of observations from a physical system. The authors use a technique called symbolic regression (a supervised machine learning technique that assembles analytic functions to model a given dataset), to arrive at a representation of the system in the form of closed-form symbolic expressions. While the paper itself is impressive, I really like the idea of people explaining papers in video format.

25 Most Common Web Developer Interview Questions And Answers Includes a coding assignment based on an interview question for software engineers that Microsoft used to ask.
My Mid-Career Job-Hunt: A Data Point for Job-Seeking Devs
One Simple Chart: Salaries and Cost of Living in Major Technology Hubs in the US
Remote First at Quora CEO Adam D’Angelo has written one of the best essays on remote work that I’ve read. He lists the reasons behind their decision and he describes how they plan to implement their “remote first” policy.

The Politics Industry: How Political Innovation Can Break Partisan Gridlock and Save Our Democracy Katherine Gehl and Harvard Business School Professor Michael Porter apply the latter’s competitive forces framework to American politics. The result is an engaging book that relies on the tools used to understand competition to “help illuminate the challenges of, and solutions for, our political system”. It is their focus on crafting solutions that makes this book a compelling and highly recommended read.
Immigration Policy and the Global Competition for AI Talent A detailed analysis of US immigration policies relevant to four categories: students in AI-related fields of study, workers in AI-related industries, distinguished AI workers, and entrepreneurs. They provide a series of interesting reference tables that show that US immigration policies are less attractive to all four categories of workers (compared to the UK, Canada, France, and Australia).
Fighting COVID-19 misinformation on social media Nudging people to think about accuracy can improve their choices about what to share on social media.
Hacking, phishing, surveillance, disinformation Tools that used to be applied to harass companies and governments are now being applied to individuals. This story focuses on Matthew Earl (founder of short seller ShadowFall), a financial analyst who first raised concerns about the German payments Wirecard five years ago.