Issue #7: Modeling Epidemics, the Future of AI, and Alternative History

 

Data Exchange podcast  

Machine Learning tools and infrastructure

  • Understanding the Ray Ecosystem and Community  In this new post I co-wrote with Ion Stoica, we explain the reasons behind the growing popularity of Ray.
  • Demystifying AI Infrastructure  This recent post by my friend Assaf Araki, contains a landscape map that brings greater clarity to the AI ecosystem. The article charts the layers of the AI technical stack and the vendors within each layer.
  • TensorFlow vs. PyTorch chartA revealing look at the frequency of TensorFlow vs. PyTorch listings in recent job postings.
  • PyCaret  This month marks the release of version 1.0 (“the first stable release”) of this easy to use wrapper for scikit-learn, XGBoost, Microsoft LightGBM, spaCy and other libraries.
  • KDNuggets survey  The results of this survey from KDNuggets reveal varied expectations about the use and impact of AutoML over time, segmented by background and country of origin.
  • Apache Pulsar user survey  The first user survey from the Apache Pulsar PMC team tracks Pulsar’s adoption rate, hot features, and a look at real-time streaming applications.
  • MLflow Model Registry  As companies begin relying on machine learning, they need to be able to keep track of model versions, dependencies, permissions and other related assets. The newly announced MLflow Model Registry on the Databricks platform is poised to become a central hub for ML models in companies with teams that use machine learning in a variety of applications.

COVID-19

  • COVID-19 forecasts  Cornell uses machine learning to predict CoVID-19 activity in Chinese provinces in real-time, leveraging internet search data and news alerts, and combining them with estimates from mechanistic models.
  • Unemployment visualization  A striking visualization of the spike in the national unemployment rate in the US over recent decades.
  • Countrywide lockdown works  A simple analysis – no models – of daily death registry data for a sample of 1,161 Italian municipalities in the seven regions most severely hit by COVID-19.
  • Misinformation during a pandemic: This interesting new study from the University of Chicago studies the effects of news coverage of the novel coronavirus using two cable news programs in the US (Hannity and Tucker Carlson Tonight). 
  • COVID-19 resource hub from Databricks: Seven COVID-19 data sets are updated regularly and made available on the Databricks Community Edition.

Virtual Conferences

Here’s an update on events I’m involved with:

  • Ray Summit Connect  The kickoff event for Ray Summit Connect is on May 13th and features two award-winning speakers – Michael Jordan and Ion Stoica – presenting on “The Future of Machine Learning and AI”. Register here.
  • MLOps Virtual Event  I’m taking part in an April 30th virtual event on MLOps. Databricks is sending key thought leaders to speak at this event: Matei Zaharia, Sean Owens, Clemens Mewald. Register here.
  • Spark+AI Summit  The acclaimed Spark+AI Summit is going virtual June 22 – 26 and will be FREE! That’s 200+ sessions on data, machine learning, and AI, plus keynote speakers like Nate Silver. Register here.

Work and hiring

Recommendations

 

If you enjoyed this newsletter please support our work by encouraging your friends and colleagues to subscribe:


[Image: Newsletter from Pixabay.]