Introduction to Tachyon and a deep dive into Baidu’s production use case

I pleased to announce a webcast that I’ll be hosting a webcast featuring the co-creator of Tachyon (full disclosure: I’m an advisor to Tachyon Nexus) alongside one of the architects behind Baidu’s big data platform. I hope to see you online on Sept 14th! Tachyon is a memory-centric fault-tolerant distributed storage system, which enables reliableContinue reading “Introduction to Tachyon and a deep dive into Baidu’s production use case”

Bringing Apache Spark closer to bare metal

Fans and users of Apache Spark will want to attend a webcast I’ll be hosting next week (Sept 3rd), featuring Josh Rosen – one of the early developers behind PySpark: Deep dive into Project Tungsten: Bring Spark closer to bare metal Project Tungsten focuses on substantially improving the efficiency of memory and CPU for SparkContinue reading “Bringing Apache Spark closer to bare metal”

Scikit-Learn 0.16

I’ll be hosting a webcast featuring two of the key contributors to what is arguably one of the most popular machine learning tools today – scikit-learn: News from Scikit-Learn 0.16 and Soon-To-Be Gems for the Next Release presented by: Olivier Grisel, Andreas Mueller This webcast will review Scikit-learn, a widely used open source machine learningContinue reading “Scikit-Learn 0.16”

Apache Spark 1.3, the new Dataframe API, and Spark performance

Over the course of a week, I’ll be hosting two good webcasts featuring Spark release manager Patrick Wendell and Spark committer Kay Ousterhout. Register now! Patrick Wendell: Spark 1.3 and Spark’s New Dataframe API (March 25th at 9 a.m. California time) In this webcast, Patrick Wendell from Databricks will be speaking about Spark’s new 1.3Continue reading “Apache Spark 1.3, the new Dataframe API, and Spark performance”

“Humans-in-the-loop” machine learning systems

Next week I’ll be hosting a webcast featuring Adam Marcus, one of the foremost experts on the topic of “humans-in-the-loop” machine learning systems. It’s a subject many data scientists have heard about, but very few have had the experience of building productions systems that leverage humans: Crowdsourcing marketplaces like Elance-oDesk or CrowdFlower give us accessContinue reading ““Humans-in-the-loop” machine learning systems”

Spark 1.2 and Beyond

Next week I’ll be hosting a webcast with Spark’s release manager – and Databricks co-founder – Patrick Wendell. (Full disclosure: I’m an advisor to Databricks.) In this webcast, Patrick Wendell from Databricks will be speaking about Spark’s new 1.2 release. Spark 1.2 brings performance and usability improvements in Spark’s core engine, a major new APIContinue reading “Spark 1.2 and Beyond”

Bitcoin and the Future of Money

I’ll be a hosting a free webcast featuring Andreas Antonopoulos this Wednesday. Author of the new book Mastering Bitcoin, Andreas has emerged as one of the most popular & eloquent proponents of cryptocurrencies and related technologies: Bitcoin technology is taking the world of finance by storm. Bitcoin and the blockchain technology that is at itsContinue reading “Bitcoin and the Future of Money”

Spark + Cassandra: Technical Integration Details

I’ll be hosting a Nov 12th webcast on two of the most popular components in the big data ecosystem: Apache Spark and Apache Cassandra. As highlighted in a recent Databricks blog post, recent improvements to Spark’s shuffle have led to significant speedups (Spark is faster than Hadoop MapReduce, even on disk). While Spark has longContinue reading “Spark + Cassandra: Technical Integration Details”

Anomaly Detection with ElasticSearch

One of the technologies that I’m hearing more about is ElasticSearch. In particular the combination of ElasticSearch, Logstash, and Kibana (the ELK stack) has proven to be a popular platform for real-time analytics on both structured and unstructured data. I’ll be hosting a webcast on October 30th on the ELK stack featuring Mark Harwood, softwareContinue reading “Anomaly Detection with ElasticSearch”