Issue #5: Hyperscaling, Computational Humanness, and Paid Sick Leave

young man in sleepwear suffering from headache in morning

Data Exchange Podcast

This edition has 511 words which will take you about 3 minutes to read.

Computational humanness, analogy and innovation, and soft concepts
Dafna Shahaf works on many interesting projects focused on enabling computers to augment human cognition in novel ways.

Episode 10: Dafna Shahaf

Hyperscaling natural language processing
Edmon Begoli is the Chief Data Architect at Oak Ridge National Laboratory. He has recently been using Ray to implement distributed online machine learning models for applications that include suicide prevention among US veterans and infectious disease surveillance.

What businesses need to know about model explainability
Krishna Gade, founder and CEO at Fiddler Labs, a startup focused on helping companies build trustworthy and understandable AI solutions. Prior to founding Fiddler, Krishna led engineering teams at Pinterest and Facebook.

Machine Learning tools and infrastructure

The Future of Computing is Distributed: Ion Stoica makes a compelling case for why distributed computing will soon become the norm, especially because of the growing importance of machine learning. Here are rather extreme examples from language models and applications:

  • An Oct/2019 paper on neural machine translation describes the amount of training involved in some large deep learning models: “Upon the submission of this paper, training has lasted for three months, 2 epochs in total, and perplexity on the development set is still dropping.
  • Turing-NLG: A newly announced 17-Billion parameter language model by Microsoft.

My new Notebook: I love using these two open source tools in combination – [Streamlit + VS Code].

Do you need data to enrich your models? Google’s Dataset Search is out of beta and a startup called Explorium might have something that can help. Data enrichment was one of the topics I covered in my 2018 keynote at Strata London.

Two new open source time series databases were just announced: Materialize (written in Rust!) and M3DB a distributed time series database from Uber.

Linkedin has open sourced DataHub, a generalized metadata search & discovery tool.

2020 Enterprise Tech 30: This list of privately held enterprise software companies is based on a survey of VCs.

Work and hiring:

COVID-19 and the Importance of Paid Sick Leave

A recent WaPo article highlights how important access to paid sick leave is, particularly in light of the current coronavirus pandemic. The article cites a 2016 study which showed that “places that imposed sick pay requirements reduced flu cases by about 10 percent or more.”


