Issue #12: Augmentation, Adversarial ML, the Politics Industry

Subscribe • Previous Issues

This edition has 820 words which will take you about 5 minutes to read.

“The most obvious, ubiquitous, important realities are the ones that are hardest to see and talk about.” – David Foster Wallace

Data Exchange podcast

[Image: Industrial Hall from Pikist]

Machine Learning tools and infrastructure

  • Using Neural Networks to Find Answers in Tables   Researchers from Google and Tel Aviv University introduce a question answering system that reasons over tables.  Given some tabular data, their system is able to answer questions about that data. This is reminiscent of systems for querying databases using a natural language interface. Based on my experience those previous systems only seemed to work well during product demos. The advent of transformers and large neural networks for NLP could finally yield a practical natural language interface for structured data.
  • Adversarial Machine Learning, industry perspectives  This paper details how 28 organizations secure their ML assets.  
  • Pulsar for Kafka people   A great overview from the recent Pulsar Summit, Pulsar is deservedly attracting the attention of companies across many different industries.
  • Map Of Computing Architectures for AWS   I’ve used this handy visual tool – a hexagon – to understand the scope of AWS, now at 258 products and counting.
  • Large image datasets: A pyrrhic win for computer vision?    This new paper attempts to draw the attention of the machine learning community towards the ethical implications of large-scale datasets used for training and benchmarking models. One such dataset, the 80 million Tiny Images dataset, has never been audited closely. The authors found labels that contained ethnic or racial slurs, leading MIT to quickly withdraw Tiny Images.

Virtual Conferences

  • Demystifying Neural Architecture Search (NAS) in Theory and in Practice   Liam Li explains recent progress in NAS, a set of techniques to automate the design of neural networks. Some promising NAS tools are now available on the open source Determined Training Platform.
  • The Uncanny Valley of Virtual Conferences   A great post by Ben Recht on re-thinking (academic) conferences in light of the current pandemic. Ben helped me organize Hardcore Data Science which ran in the early years of my stint as Strata’s program chair.
  • Keynotes from the Spark+AI Summit   I live-tweeted some of the keynotes at the recent conference and collected several of my Twitter threads into a post. As I touted in a previous newsletter, Hany Farid (“father of digital forensics”) gave an outstanding presentation on deepfakes.
  • Discovering Symbolic Models from Deep Learning with Inductive Biases  A video explaining a recent paper that uses graph neural networks to fit a data set of observations from a physical system. The authors use a technique called symbolic regression (a supervised machine learning technique that assembles analytic functions to model a given dataset), to arrive at a representation of the system in the form of closed-form symbolic expressions. While the paper itself is impressive, I really like the idea of people explaining papers in video format.


Work and Hiring

[Image: Milwaukee Art Museum from pxhere.]



  • The Politics Industry: How Political Innovation Can Break Partisan Gridlock and Save Our Democracy   Katherine Gehl and Harvard Business School Professor Michael Porter apply the latter’s competitive forces framework to American politics. The result is an engaging book that relies on the tools used to understand competition to “help illuminate the challenges of, and solutions for, our political system”. It is their focus on crafting solutions that makes this book a compelling and highly recommended read.
  • Immigration Policy and the Global Competition for AI Talent   A detailed analysis of US immigration policies relevant to four categories: students in AI-related fields of study, workers in AI-related industries, distinguished AI workers, and entrepreneurs. They provide a series of interesting reference tables that show that US immigration policies are less attractive to all four categories of workers (compared to the UK, Canada, France, and Australia).
  • Fighting COVID-19 misinformation on social media  Nudging people to think  about accuracy can improve their choices about what to share on social media.  
  • Hacking, phishing, surveillance, disinformation   Tools that used to be applied to harass companies and governments are now being applied to individuals. This story focuses on Matthew Earl (founder of short seller ShadowFall), a financial analyst who first raised concerns about the German payments Wirecard five years ago.