Issue #13: Machine Learning Platforms, Graphs, and AI Ethics

Subscribe Previous Issues

This edition has 790 words which will take you about 4 minutes to read.

“Projects have communities – communities have time and no money. Products have customers – customers have money and no time.”  Mårten Mickos

Data Exchange podcast

  • How graph technologies are being used to solve complex business problems   Graphs power many applications we rely on including search, recommendation systems, fraud detection, identity management and much more. There is no better guide to the world of graphs than Denise Gosnell, Chief Data Officer at DataStax and the co-author of the new book, The Practitioner’s Guide to Graph Data,   
  • Democratizing Machine Learning  Ameet Talwalkar, co-founder and Chief Scientist at Determined AI, and an Assistant Professor in the Machine Learning Department at Carnegie Mellon University, has been involved in important developments in areas like hyperparameter tuning and neural architecture search. With that said, he is not purely a researcher – his focus is on building tools to make machine learning much more accessible.
  • NLP in industry survey   The research community continues to release impressive natural language models that improve on existing benchmarks across many NLP tasks. We want to find out how people are using NLP, what tools they are using, and what challenges they face. Please take 5 minutes to fill out our survey and pass it along to your friends and colleagues.

[Image: Roof, Tomb of Harez from Wikimedia]

 

Machine Learning tools and infrastructure

  • Five Key Features for a Machine Learning Platform  This is a new post that I co-wrote with Ion Stoica. We share insights derived from conversations with many ML platform builders. More specifically, we list features that will be critical to ensuring that your ML platform is well-positioned for modern AI applications.
  • Making Netflix’s Data Infrastructure Cost-Effective    This is a great example of a monitoring system designed to provide cost transparency to decision makers.  As data infrastructures become more complex – due to ML, distributed systems, & multi-cloud deployments – access to tools that provide usage and cost visibility will be necessary.
  • Natural Language Processing Advancements By Deep Learning   A comprehensive survey paper that covers important topics including parsing & part-of-speech tagging, text classification & summarization, machine translation, Question & Answering, information extraction, and more.
  • Intro to RLlib: Example Environments    Paco Nathan recently published an introduction to reinforcement learning for developers. He uses RLlib + OpenAI Gym and works step-by-step through sample code.

 

Virtual Conferences

  • Ray Summit relaunches  I’m co-chair of this first year event whose tagline is “Scalable machine learning, scalable Python, for everyone.”  The event is now FREE and online (Sep 30th & Oct 1st). Keynote speakers include Michael Jordan, Manuela Veloso, Azalia Mirhoseini, Zoubin Ghahramani, Wes McKinney, Ion Stoica, Gaël Varoquaux, and Raluca Popa. We are adding many other stellar speakers. Register HERE.
  • Random search and reproducibility for neural architecture search   This was one of my favorite talks from 2019. Ameet Talwalkar gave a great overview of neural architecture search to an industry audience.
  • The Evolution of Data Infrastructure at Splunk  This type of talk (“evolution of data platform at …”) were always among the most popular presentations at Strata when I was program chair. This is a recent Flink Forward keynote by Eric Sammer, a former Strata program committee member.

 

Work and Hiring

[Image: Abu Dhabi, Hyatt Capital Gate by Ben Lorica.]

Recommendations


Subscribe to our newsletter, our YouTube channel, and to the Data Exchange podcast.