Data Science Archives - Page 15 of 22

The science of moving dots: the O’Reilly Data Show Podcast

Rajiv Maheswaran talks about the tools and techniques required to analyze new kinds of sports data [This post originally appeared on the O’Reilly Radar blog.] Editor’s note: you can subscribe to the O’Reilly Data Show Podcast through iTunes, SoundCloud or through our RSS feed. Many data scientists are comfortable working with structured operational data andContinue reading “The science of moving dots: the O’Reilly Data Show Podcast”

Anomaly Detection with ElasticSearch

One of the technologies that I’m hearing more about is ElasticSearch. In particular the combination of ElasticSearch, Logstash, and Kibana (the ELK stack) has proven to be a popular platform for real-time analytics on both structured and unstructured data. I’ll be hosting a webcast on October 30th on the ELK stack featuring Mark Harwood, softwareContinue reading “Anomaly Detection with ElasticSearch”

Time-turner: Strata NYC 2014, day 2

There are so many good talks happening at the same time that it’s impossible to not miss out on good sessions. But imagine I had a time-turner necklace and could actually “attend” 2 (maybe 3) sessions happening at the same time. Taking into account my current personal interests and tastes, here’s how my day wouldContinue reading “Time-turner: Strata NYC 2014, day 2”

Time-turner: Strata NYC 2014, day 1

Unboxing Apache Spark 1.1

Apache Spark version 1.1 shipped a few weeks ago. I’ve been enjoying enhancements to MLlib, Spark SQL, and Spark Streaming. Next week I’ll be hosting a webcast with Spark’s release manager – and Databricks co-founder – Patrick Wendell. (Full disclosure: I’m an advisor to Databricks.) In this webcast, Patrick Wendell from Databricks will be speakingContinue reading “Unboxing Apache Spark 1.1”

Announcing Spark Certification

I’m happy to announce the Databricks/O’Reilly Developer Certification for Apache Spark! For more details, please read my post on the O’Reilly Radar.

Bits from the Data Store

Semi-regular field notes from the world of data: Michael Jordan (“ask me anything”): The distinguished machine learning and Bayesian researcher from UC Berkeley’s AMPLab has an interesting perspective on machine learning and statistics. … while I do think of neural networks as one important tool in the toolbox, I find myself surprisingly rarely going toContinue reading “Bits from the Data Store”

Bits from the Data Store

Semi-regular field notes from the world of data: Apache Spark development community: Josh Rosen of Databricks recently built a tool for browsing pull requests. I like that it lets you scan each of the major components (Spark SQL, Streaming, MLlib, etc.). Now that Spark has become one of the most active open source projects inContinue reading “Bits from the Data Store”

Bits from the Data Store

Semi-regular field notes from the world of data: Alibaba ♥ Spark: Next time someone asks you if Apache Spark scales, point them to this recent post by Chinese e-commerce juggernaut Alibaba. What particularly caught my eye is the company’s heavy usage of GraphX, Spark’s library for graph analytics. [Full disclosure: I’m an advisor to Databricks,Continue reading “Bits from the Data Store”

Real-world Active Learning

Beyond building training sets for machine-learning, crowdsourcing is being used to enhance the results of machine-learning models: in active learning, humans take care of uncertain cases, models handle the routine ones. Active Learning is one of those topics that many data scientists have heard of, few have tried, and a small handful know how toContinue reading “Real-world Active Learning”