spark Archives - Page 2 of 5

Compressed representations in the age of big data

[A version of this post appears on the O’Reilly Radar.] Emerging trends in intelligent mobile applications and distributed computing When developing intelligent, real-time applications, one often has access to a data platform that can wade through and unlock patterns in massive data sets. The back-end infrastructure for such applications often relies on distributed, fault-tolerant, scaleoutContinue reading “Compressed representations in the age of big data”

Investing in big data technologies

The O’Reilly Data Show podcast: A fireside chat with Ben Horowitz, plus Reynold Xin on the rise of Apache Spark in China. [A version of this post appears on the O’Reilly Radar.] Subscribe to the O’Reilly Data Show Podcast to explore the opportunities and techniques driving big data and data science. In this special holidayContinue reading “Investing in big data technologies”

Building a scalable platform for streaming updates and analytics

[A version of this post appears on the O’Reilly Radar.] The O’Reilly Data Show podcast: Evan Chan on the early days of Spark+Cassandra, FiloDB, and cloud computing. Subscribe to the O’Reilly Data Show Podcast to explore the opportunities and techniques driving big data and data science. In this episode of the O’Reilly Data Show, IContinue reading “Building a scalable platform for streaming updates and analytics”

Building enterprise data applications with open source components

[A version of this article appears on the O’Reilly Radar.] The O’Reilly Data Show podcast: Dean Wampler on bounded and unbounded data processing and analytics. Subscribe to the O’Reilly Data Show Podcast to explore the opportunities and techniques driving big data and data science. I first found myself having to learn Scala when I startedContinue reading “Building enterprise data applications with open source components”

Introduction to Tachyon and a deep dive into Baidu’s production use case

I pleased to announce a webcast that I’ll be hosting a webcast featuring the co-creator of Tachyon (full disclosure: I’m an advisor to Tachyon Nexus) alongside one of the architects behind Baidu’s big data platform. I hope to see you online on Sept 14th! Tachyon is a memory-centric fault-tolerant distributed storage system, which enables reliableContinue reading “Introduction to Tachyon and a deep dive into Baidu’s production use case”

Bringing Apache Spark closer to bare metal

Fans and users of Apache Spark will want to attend a webcast I’ll be hosting next week (Sept 3rd), featuring Josh Rosen – one of the early developers behind PySpark: Deep dive into Project Tungsten: Bring Spark closer to bare metal Project Tungsten focuses on substantially improving the efficiency of memory and CPU for SparkContinue reading “Bringing Apache Spark closer to bare metal”

6 reasons why I like KeystoneML

[A version of this article appears on the O’Reilly Radar.] The O’Reilly Data Show Podcast: Ben Recht on optimization, compressed sensing, and large-scale machine learning pipelines. As we put the finishing touches on what promises to be another outstanding Hardcore Data Science Day at Strata + Hadoop World in New York, I sat down withContinue reading “6 reasons why I like KeystoneML”

Apache Spark in the Enterprise and in China

Enterprise Adoption IBM’s announcements at the recent Spark Summit in SF bodes well for enterprise adoption of Spark. Ben Horowitz jokingly referred to IBM’s endorsement as akin to a Rabbi blessing Spark as kosher for use in an enterprise. I recently sat down with a set of luminaries at the Spark Summit and asked themContinue reading “Apache Spark in the Enterprise and in China”

Fireside chat with Ben Horowitz

I had the pleasure of interviewing Ben Horowitz on the main stage at the recent Spark summit in SFO. Ben is co-founder of one of the leading tech venture capital firms a16z, and author of one of my favorite books about entrepreneurship (“The Hard Thing About Hard Things”). The Spark Summit had a packed lineup,Continue reading “Fireside chat with Ben Horowitz”

Large-scale Data Science and Machine Learning with Spark

[Full disclosure: I’m an advisor to Databricks.] At last year’s Spark Summit in SF, Ali Ghodsi gave the first public demo of Databricks Cloud and Workspace. As I noted at the time, it was a showstopper! This year Ali gave an update and while I wasn’t on hand to see it in person, judging fromContinue reading “Large-scale Data Science and Machine Learning with Spark”