I’ll be hosting a webcast on Spark SQL featuring Michael Armbrust of Databricks:
In this webcast, we’ll examine Spark SQL, a new Alpha component that is part of the Apache Spark 1.0 release. Spark SQL lets developers natively query data stored in both existing RDDs and external sources such as Apache Hive. A key feature of Spark SQL is the ability to blur the lines between relational tables and RDDs, making it easy for developers to intermix SQL commands that query external data with complex analytics. In addition to Spark SQL, we’ll explore the Catalyst optimizer framework, which allows Spark SQL to automatically rewrite query plans to execute more efficiently.
It’s scheduled for Tuesday, April 29, 2014 at 1PM (San Francisco time). I’ll introduce Michael and moderate a Q&A following his presentation. Spark SQL is generating a lot of interest within the Apache Spark community. This is a great opportunity to learn about it from its lead developer. I hope to see you online on the 29th.