Issue #11: Dark Data, AI Talent, and Reinforcement Learning

Subscribe • Previous Issues

This edition has 817 words which will take you about 4 minutes to read.

“When someone’s always watching, we lose our sense of self.” – Patricia Williams

Data Exchange podcast

Viewing machine learning and data science applications as sociotechnical systems A conversation with Chris Wiggins, Associate Professor at Columbia University, Chief Data Scientist at the New York Times, and co-founder of hackNY.
Building open source developer tools for language applications Matthew Honnibal is the founder of Explosion AI, the team behind popular tools like spaCy (NLP), Thinc (lightweight deep learning library), and Prodigy (annotation and active learning).
Videos We have an active YouTube channel where we have video versions of some recent podcast episodes, as well as an archive of our popular weekly 2-minute Data+AI snapshots.

[Image: Deedee86 from Pixabay]

Machine Learning tools and infrastructure

Ten Questions on AI Risk A new checklist from the Future of Privacy Forum.
Attacking Deep Reinforcement Learning RL has been applied to autonomous driving and automated trading, domains where it would be hard for an attacker to directly modify the subject’s policy input. In the self-driving car example, an adversary can impact a camera’s image, but only in a physically realistic fashion (an adversary cannot add noise to arbitrary pixels or make a building disappear). This UC Berkeley research project involving simulated robotics games, introduces adversarial policies that reliably beat their victim.
Unsupervised Translation of Programming Languages A new paper from Facebook AI, shows how unsupervised machine translation can be applied to source code to create a transcompiler. This could result in tools that can automatically migrate existing codebases to modern or more efficient programming languages.
Demystifying AI Infrastructure This short video describes a landscape map that brings greater clarity to the AI ecosystem.
Enterprise Applications of Reinforcement Learning This new seven minute video covers applications of RL to recommenders and simulation software.

Virtual Conferences

Adam Paszke at the Spark+AI Summit (June 22-26) Adam is a Senior Research Scientist at Google and the author of PyTorch. He will explain recent efforts to help drive more industry adoption of PyTorch, a topic I covered in a recent short post.
Practical Reinforcement Learning Having recently written an article on enterprise applications of RL, I’m looking forward to this free July 8th virtual event. Register here.
Algorithms and Race This 2019 interview features legal scholar Patricia Williams and computer scientist Cynthia Dwork. At the time of the interview they were the organizers of a workshop on Racial Bias in Data, which was part of a Simons Institute program on Fairness.

Work and Hiring

Examining the five shortcomings of China’s AI talent system A recently translated article courtesy of the China AI Newsletter.
Chinese educated researchers contributed one-third of all papers to top-tier AI conference The catch is most of them lived in the U.S. and work for American companies and Universities. This means that restricting visas for students from China will have a profound impact on A.I. research in the U.S.
Making important life decisions with a flip of a coin A paper, based on a large-scale study by Freakonomics author Steve Levitt, suggests that when it comes to major decisions (e.g., whether to quit a job, seek more education, end a relationship, etc.), “those who do make a change report being no worse off after two months and much better off six months later”. In other words, when in doubt, it’s better to err on the side of Action rather than Inaction.
The best QA job interview questions for managers to ask

[Image: AWeith / CC BY-SA]

Recommendations

Dark Data: Why What You Don’t Know Matters Add this to the list of books to give to your CxO. Statistician David Hand develops a taxonomy for “dark data”, his shorthand for different types of missing data. This book is aimed at a non-technical audience and is filled with examples of what can happen when people fail to acknowledge that they likely have incomplete data. A much cited quote about COVID-19 from this week (“if we stop testing right now we’d have very few cases”) makes the case for sharing this book widely.
One Simple Chart: News Consumption in the US A group of researchers combed through Google Scholar and found an explosion of research on online sources of fake news and misinformation. The goal of the study was to put fake news into context, and the authors concluded that too much attention is placed on fake news. They found that (1) news consumption is heavily outweighed by other forms of media consumption, (2) Americans consume more TV news than online news, (3) fake news is a small part of the overall media diet. Note that this study took place from January 2016 to December 2018, long before COVID-19 and shelter-in-place may have affected media consumption habits. A March/2020 analysis hinted that news has become America’s biggest pastime.
Penrose from CMU – from mathematical notation to beautiful diagrams This new project lets you translate abstract statements written in math-like notation into one or more visual representations.
H.E.R’s Full Performance On Graduate Together 2020 My favorite performance at the recent #GraduateTogether virtual celebration was H.E.R. on piano singing “Sometimes”. Here’s a good interview with H.E.R. recorded in a Filipino restaurant in NYC.

Data Exchange podcast

Machine Learning tools and infrastructure

Virtual Conferences

Work and Hiring

Recommendations

Share this:

Like this:

Discover more from Gradient Flow