Gradient Flow #21: Detecting Fake News, AutoBI, Feature Stores

Subscribe • Previous Issues

This edition has 459 words which will take you about 3 minutes to read.

“Any figure that looks interesting or different is usually wrong.” – W.A. Twyman.

Data Exchange podcast

  • The Computational Limits of Deep Learning   Neil Thompson is a research scientist at the Computer Science and Artificial Intelligence Lab (CSAIL) and the Initiative on the Digital Economy, both at MIT.  I wanted Neil on the podcast to discuss a recent paper he co-wrote on the computational demands, economic costs, and environmental impact of AI.
  • Detecting Fake News  This is a well-timed episode given how election week is unfolding here in the U.S.   Xinyi Zhou is a graduate student in Computer and Information Science at Syracuse University.  She recently co-wrote a comprehensive and must-read survey paper on the different tools and perspectives used to detect fake news.

[Image by PublicDomainPictures from Pixabay.]

Machine Learning tools and infrastructure

  • AI and Automation meets BI   In this recent post with Assaf Araki of Intel Capital, we discuss the use of automation in Business Intelligence.  These tools reduce the need for manual analysis (“AutoBI”) and reduce time to insights by enabling data and business analysts to do more on their own, without the assistance of their IT teams.
  • The ideal foundation for a general purpose serverless platform  In this post with Ion Stoica and Eric Liang, we examine limitations of current cloud functions (also referred to as FaaS or serverless). We note that the distributed computing framework Ray addresses many of these challenges and argue that Ray is the right foundation for a general purpose serverless framework.
  • Feature Store vs Data Warehouse   Data warehouses are fine for storing precomputed features but ML pipelines require additional functionality.
  • Introducing Streamlit Sharing  I’ve come to rely on the combination of [VS Code + Streamlit] for personal projects. With Streamlit Sharing users can now publish (deploy, manage, and share) apps for free.

Virtual Conferences

  • Are you using AI responsibly?  This free, virtual event will bring together experts who will share best practices honed from extensive real-world experience in areas around Fairness, Security, and Compliance.  Join us December 15, 2020 (10 – 11:15 a.m. PT) to learn more about how to implement responsible AI in your organization.

Work and Hiring

[Image: Batu Caves in Kuala Lumpur, from pxfuel]


  • A better way to calculate churn rates   SaaS providers may need to brush up on the Weibull distribution.
  • Data valuation using reinforcement learning  A new framework from Google Cloud AI estimates the value of each piece of training data. This has many potential applications. It can be used to improve practices for data collection. Organizations that sell data can use it to price each datum. And since one can filter away less valuable data, this can be used to reduce the cost of constructing large-scale training datasets.
  • Selling data to hedge funds: the definitive guide
  • Two papers from Waymo   Industry insiders believe that Waymo has far and away the best autonomous driving technology.  The first paper “provides a detailed analysis of every actual and simulated, counterfactual (‘what if’) collision or contact that was collected from more than 6.1 million miles of fully automated driving in Phoenix”.  The second paper outlines their safety philosophy.

If you enjoyed this newsletter please support our work by encouraging your friends and colleagues to subscribe: