Anomaly Detection with ElasticSearch

One of the technologies that I’m hearing more about is ElasticSearch. In particular the combination of ElasticSearch, Logstash, and Kibana (the ELK stack) has proven to be a popular platform for real-time analytics on both structured and unstructured data. I’ll be hosting a webcast on October 30th on the ELK stack featuring Mark Harwood, softwareContinue reading “Anomaly Detection with ElasticSearch”

A growing number of applications are being built with Spark

Many more companies are willing to talk about how they’re using Apache Spark in production [A version of this post appears on the O’Reilly Data blog.] One of the trends we’re following closely at Strata is the emergence of vertical applications. As components for creating large-scale data infrastructures enter their early stages of maturation, companiesContinue reading “A growing number of applications are being built with Spark”

Graphs, Time-series, Dataviz, and Crowdsourcing at Strata Santa Clara 2014

There are many fantastic talks at Strata and it can be overwhelming to navigate the schedule. I plan to list talks I’m hoping to catch in a series of “time-turner” posts (check this blog on Wed/Thu at 10 a.m.). But for now let me highlight talks from a few categories: Graphs and Network Analysis: Large-scaleContinue reading “Graphs, Time-series, Dataviz, and Crowdsourcing at Strata Santa Clara 2014”

How Twitter monitors millions of time-series

[A version of this post appears on the O’Reilly Strata blog.] One of the keys to Twitter’s ability to process 500 millions tweets daily is a software development process that values monitoring and measurement. A recent post from the company’s Observability team detailed the software stack for monitoring the performance characteristics of software services, andContinue reading “How Twitter monitors millions of time-series”