Spark + Cassandra: Technical Integration Details

I’ll be hosting a Nov 12th webcast on two of the most popular components in the big data ecosystem: Apache Spark and Apache Cassandra. As highlighted in a recent Databricks blog post, recent improvements to Spark’s shuffle have led to significant speedups (Spark is faster than Hadoop MapReduce, even on disk). While Spark has long worked well with Hadoop (HDFS), it now integrates well with other storage systems like Amazon S3 and Apache Cassandra. In an upcoming webcast, Sameer Farooqui will discuss the state of Spark/Cassandra integration:

This webcast will cover an architecture deep dive around how the Apache Cassandra database integrates with the Apache Spark computation engine.

We will cover:

  • Ideal use cases for Cassandra + Spark
  • Details of how Cassandra’s murmer3 partitioning maps to a Spark RDD’s internal partitioning
  • Considerations when using caching in Spark against C* tables
  • Specific configuration settings relevant to Cassandra + Spark integration
  • The DataStax open source Spark connector for Cassandra 2.x and how it works
  • Introduction to a free ~100 page ‘DevOps’ lab document (licensed under Creative Commons) that Databricks has released around how the integration works
  • Live demo of a Cassandra + Spark cluster (how to read data from a C* table into a Spark RDD, do some transformations on the RDD, write results back into a Cassandra table)
  • Upcoming features in future versions of the connector and current issues to be aware of.
  • On another front: the joint Databricks/O’Reilly Spark Developer Certification exam will be offered for the first time in Strata-Barcelona. Come to Barcelona and become one of the first certified Spark developers!

    1 Comment

    Leave a Reply

    Please log in using one of these methods to post your comment: Logo

    You are commenting using your account. Log Out /  Change )

    Google photo

    You are commenting using your Google account. Log Out /  Change )

    Twitter picture

    You are commenting using your Twitter account. Log Out /  Change )

    Facebook photo

    You are commenting using your Facebook account. Log Out /  Change )

    Connecting to %s

    This site uses Akismet to reduce spam. Learn how your comment data is processed.