Data Exchange Podcast
This edition has 511 words which will take you about 3 minutes to read.
Computational humanness, analogy and innovation, and soft concepts
Dafna Shahaf works on many interesting projects focused on enabling computers to augment human cognition in novel ways.
Subscribe: iTunes, Android, Spotify, Stitcher, Google, and RSS.
Hyperscaling natural language processing
Edmon Begoli is the Chief Data Architect at Oak Ridge National Laboratory. He has recently been using Ray to implement distributed online machine learning models for applications that include suicide prevention among US veterans and infectious disease surveillance.
What businesses need to know about model explainability
Krishna Gade, founder and CEO at Fiddler Labs, a startup focused on helping companies build trustworthy and understandable AI solutions. Prior to founding Fiddler, Krishna led engineering teams at Pinterest and Facebook.
Machine Learning tools and infrastructure
The Future of Computing is Distributed: Ion Stoica makes a compelling case for why distributed computing will soon become the norm, especially because of the growing importance of machine learning. Here are rather extreme examples from language models and applications:
- An Oct/2019 paper on neural machine translation describes the amount of training involved in some large deep learning models: “Upon the submission of this paper, training has lasted for three months, 2 epochs in total, and perplexity on the development set is still dropping.”
- Turing-NLG: A newly announced 17-Billion parameter language model by Microsoft.
My new Notebook: I love using these two open source tools in combination – [Streamlit + VS Code].
Do you need data to enrich your models? Google’s Dataset Search is out of beta and a startup called Explorium might have something that can help. Data enrichment was one of the topics I covered in my 2018 keynote at Strata London.
Two new open source time series databases were just announced: Materialize (written in Rust!) and M3DB a distributed time series database from Uber.
Linkedin has open sourced DataHub, a generalized metadata search & discovery tool.
2020 Enterprise Tech 30: This list of privately held enterprise software companies is based on a survey of VCs.
Work and hiring:
- Remote Work Insights you’ve never heard before: a great post by Sarah Milstein.
- Mechanize your hiring process: tips from Anurag Gupta of Amazon, GM of Amazon Redshift and Amazon Aurora.
- @vcstarterkit classifies tweets from the VC twitterverse.
- The creator of the YOLO object detection method, stopped doing computer vision research because “military applications and privacy concerns eventually became impossible to ignore”.
COVID-19 and the Importance of Paid Sick Leave
A recent WaPo article highlights how important access to paid sick leave is, particularly in light of the current coronavirus pandemic. The article cites a 2016 study which showed that “places that imposed sick pay requirements reduced flu cases by about 10 percent or more.”
Recommendations
- AI in Healthcare: a new report from the US National Academy of Medicine.
- Dressing for the surveillance age: a recent New Yorker piece on adversarial and poison attacks against deep learning models in surveillance systems.
- A short NY Times video on Chicano subculture … in Japan.
- Uber whistleblower Susan Fowler in conversation with Mina Kim.
Subscribe to our Newsletter:
We also publish a popular newsletter where we share highlights from recent episodes, trends in AI / machine learning / data, and a collection of recommendations.
[Image: Newsletter from Pixabay.]