Bringing Apache Spark closer to bare metal

Fans and users of Apache Spark will want to attend a webcast I’ll be hosting next week (Sept 3rd), featuring Josh Rosen – one of the early developers behind PySpark:

Deep dive into Project Tungsten: Bring Spark closer to bare metal

Project Tungsten focuses on substantially improving the efficiency of memory and CPU for Spark applications, to push performance closer to the limits of modern hardware. This effort includes three initiatives:

  • Memory Management and Binary Processing: leveraging application semantics to manage memory explicitly and eliminate the overhead of JVM object model and garbage collection
  • Cache-aware computation: algorithms and data structures to exploit memory hierarchy
  • Code generation: using code generation to exploit modern compilers and CPUs
  • Project Tungsten will be the largest change to Spark’s execution engine since the project’s inception. In this talk, we will give an update on its progress and dive into some of the technical challenges we are solving.

    Leave a Reply

    Fill in your details below or click an icon to log in:

    WordPress.com Logo

    You are commenting using your WordPress.com account. Log Out / Change )

    Twitter picture

    You are commenting using your Twitter account. Log Out / Change )

    Facebook photo

    You are commenting using your Facebook account. Log Out / Change )

    Google+ photo

    You are commenting using your Google+ account. Log Out / Change )

    Connecting to %s