This edition has 790 words which will take you about 4 minutes to read.
“Projects have communities – communities have time and no money. Products have customers – customers have money and no time.” Mårten Mickos
Data Exchange podcast
- How graph technologies are being used to solve complex business problems Graphs power many applications we rely on including search, recommendation systems, fraud detection, identity management and much more. There is no better guide to the world of graphs than Denise Gosnell, Chief Data Officer at DataStax and the co-author of the new book, The Practitioner’s Guide to Graph Data,
- Democratizing Machine Learning Ameet Talwalkar, co-founder and Chief Scientist at Determined AI, and an Assistant Professor in the Machine Learning Department at Carnegie Mellon University, has been involved in important developments in areas like hyperparameter tuning and neural architecture search. With that said, he is not purely a researcher – his focus is on building tools to make machine learning much more accessible.
- NLP in industry survey The research community continues to release impressive natural language models that improve on existing benchmarks across many NLP tasks. We want to find out how people are using NLP, what tools they are using, and what challenges they face. Please take 5 minutes to fill out our survey and pass it along to your friends and colleagues.
[Image: Roof, Tomb of Harez from Wikimedia]
Machine Learning tools and infrastructure
- Five Key Features for a Machine Learning Platform This is a new post that I co-wrote with Ion Stoica. We share insights derived from conversations with many ML platform builders. More specifically, we list features that will be critical to ensuring that your ML platform is well-positioned for modern AI applications.
- Making Netflix’s Data Infrastructure Cost-Effective This is a great example of a monitoring system designed to provide cost transparency to decision makers. As data infrastructures become more complex – due to ML, distributed systems, & multi-cloud deployments – access to tools that provide usage and cost visibility will be necessary.
- Natural Language Processing Advancements By Deep Learning A comprehensive survey paper that covers important topics including parsing & part-of-speech tagging, text classification & summarization, machine translation, Question & Answering, information extraction, and more.
- Intro to RLlib: Example Environments Paco Nathan recently published an introduction to reinforcement learning for developers. He uses RLlib + OpenAI Gym and works step-by-step through sample code.
Virtual Conferences
- Ray Summit relaunches I’m co-chair of this first year event whose tagline is “Scalable machine learning, scalable Python, for everyone.” The event is now FREE and online (Sep 30th & Oct 1st). Keynote speakers include Michael Jordan, Manuela Veloso, Azalia Mirhoseini, Zoubin Ghahramani, Wes McKinney, Ion Stoica, Gaël Varoquaux, and Raluca Popa. We are adding many other stellar speakers. Register HERE.
- Random search and reproducibility for neural architecture search This was one of my favorite talks from 2019. Ameet Talwalkar gave a great overview of neural architecture search to an industry audience.
- The Evolution of Data Infrastructure at Splunk This type of talk (“evolution of data platform at …”) were always among the most popular presentations at Strata when I was program chair. This is a recent Flink Forward keynote by Eric Sammer, a former Strata program committee member.
Work and Hiring
- Tech Sector Job Interviews Assess Anxiety, Not Software Skills A new study from North Carolina State University and Microsoft (based on interviews with 48 computer science undergraduates and graduate students), “suggests that a lot of well-qualified job candidates are being eliminated because they’re not used to working on a whiteboard in front of an audience.” The authors found other problems in the hiring pipeline for software engineers in an earlier study.
- Written communication is remote work super power Tips on how to write intentionally for asynchronous communication.
- One Simple Chart: skills gap in data science
- Flight attendants yearn to get back in the air
[Image: Abu Dhabi, Hyatt Capital Gate by Ben Lorica.]
Recommendations
- Exploring the failings of AI ethics guidelines “Of the more than 160 documents in our database, only ten have practical enforcement mechanisms.” What’s the point of putting together ethics guidelines if they aren’t operational and aligned to potential liabilities & risks?
- This Is Not Propaganda: Adventures in the War Against Reality Information wars occur all over the world and in this interesting new book, Peter Pomerantsev systematically examines how disinformation is being used in places like the Philippines, Russia, the US, and more. Disinformation and misinformation will escalate as we get closer to the US Presidential elections (a mere 110 days away). Glance at the list of alerts on the Bot Sentinel dashboard to get a sense of what bots are promoting at any given moment.
- De-escalating social media conflict An interesting post that comes with some possible mechanisms (“Twitter Mea Culpa”) for admitting mistakes, adding corrections, and forgiving a user.
- An Early Warning Approach to Monitor COVID-19 Activity Fusing big data streams to create an early warning system, or an indication of epidemic spread. This is an early stage project that is meant to supplement and not replace traditional public health monitoring systems.
- Filipino Musicians Drive Hong Kong’s Music Scene, but Gigs Have Dried Up
Subscribe to our newsletter, our YouTube channel, and to the Data Exchange podcast.