Advanced Analytics on Relational Data with Spark SQL

I’ll be hosting a webcast on Spark SQL featuring Michael Armbrust of Databricks:

In this webcast, we’ll examine Spark SQL, a new Alpha component that is part of the Apache Spark 1.0 release. Spark SQL lets developers natively query data stored in both existing RDDs and external sources such as Apache Hive. A key feature of Spark SQL is the ability to blur the lines between relational tables and RDDs, making it easy for developers to intermix SQL commands that query external data with complex analytics. In addition to Spark SQL, we’ll explore the Catalyst optimizer framework, which allows Spark SQL to automatically rewrite query plans to execute more efficiently.

It’s scheduled for Tuesday, April 29, 2014 at 1PM (San Francisco time). I’ll introduce Michael and moderate a Q&A following his presentation. Spark SQL is generating a lot of interest within the Apache Spark community. This is a great opportunity to learn about it from its lead developer. I hope to see you online on the 29th.

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s