I recently live-tweeted some of the keynotes from the 2020 #SparkAISummit and I collected the series of Twitter Threads in this short post:
- Matei Zaharia on Spark 3.0
- Ali Ghodsi on Lakehouses
- Reynold Xin on Delta Engine and Photon
- Clemens Mewald on the Data Science Workspace
- Matei Zaharia on MLflow
- Kim Hazelwood on deep learning for recommenders
- Hany Farid on Deepfakes
- Adam Paszke: PyTorch as a modern ML research and production platform
- Amy Heineike on Lessons from Covid19Primer.com
Matei Zaharia on Spark 3.0
Thread: Celebrating the tenth year of Spark, and the release of Spark 3.0 while listening to @matei_zaharia's #SparkAISummit keynote βπΌ I found the first post I wrote about Spark, roughly eight years ago (dated 2012-08-12) ππ½ https://t.co/bPTQJfMSNS pic.twitter.com/1KX7cWluZs
— Ben Lorica η½ηε‘ (@bigdata) June 24, 2020
Ali Ghodsi on Lakehouses
Thread: @alighods #SparkAISummit keynote on Lakehouse, a data management paradigm we wrote about early this year https://t.co/CMExlP3wi3
— Ben Lorica η½ηε‘ (@bigdata) June 24, 2020
Reynold Xin on Delta Engine and Photon
Thread: @rxin announces Delta Engine at the #SparkAISummit π Delta Engine is designed to accelerate SQL and dataframe workloads with two components: a query optimizer & a vectorized execution engine. pic.twitter.com/E5hOPmxaFH
— Ben Lorica η½ηε‘ (@bigdata) June 24, 2020
Clemens Mewald on the Data Science Workspace
1/ Day Two of Live tweeting at the #SparkAISummit π»π± @ClemensMewald keynote β¦ Tooling for #DataTeams is still a challenge for most companies β pic.twitter.com/rc9j7n0p2K
— Ben Lorica η½ηε‘ (@bigdata) June 25, 2020
Matei Zaharia on MLflow
Thread: @matei_zaharia is giving updates on MLflow at #SparkAISummit. He describes MLflow as an open source *Machine Learning Platform* that now has four main components: experiment management, reproducible runs, packaging & deployment, and a model registry pic.twitter.com/xQaXEiM45F
— Ben Lorica η½ηε‘ (@bigdata) June 25, 2020
Kim Hazelwood on deep learning for recommenders
Thread: live tweeting @DrKimHazelwood @Facebook keynote at #SparkAISummit: the share of Facebookβs structured data being used machine learning models grew from 30% last year to 50% this year. The number of engineers who train ML models has doubled β # of models has quintupled pic.twitter.com/2R7tadBtW0
— Ben Lorica η½ηε‘ (@bigdata) June 25, 2020
Hany Farid on Deepfakes
Thread: Next up at #SparkAISummit is Hany Farid of @Berkeley_EECS, widely considered to be the father of digital forensics, a field that finds itself in the middle of the battle against deepfakes, the use of models to synthesize realistic images & video https://t.co/XDoCpgYKdy pic.twitter.com/DeITmb73gC
— Ben Lorica η½ηε‘ (@bigdata) June 25, 2020
Adam Paszke: PyTorch as a modern ML research and production platform
1/ Opening this morningβs #SparkAISummit β¦ @apaszke of @Google author of #PyTorch. As Adam notes, PyTorch continues to be the favorite library used by ML researchers: over half of ML papers were implemented with PyTorch. This is trend I observed last year https://t.co/V0BAVE7tTN pic.twitter.com/YZjwYlZcjU
— Ben Lorica η½ηε‘ (@bigdata) June 26, 2020
Amy Heineike on Lessons from Covid19Primer.com
Thread: Closing #SparkAISummit keynote is @aheineike @primer_ai, she is describing one of the best sites on #COVID19 β¦ COVID19Primer uses advanced #NLproc to summarize, surface trends in research papers, media articles, & social networks. pic.twitter.com/yJCIsFEmsr
— Ben Lorica η½ηε‘ (@bigdata) June 26, 2020