Architecting and building end-to-end streaming applications

[A version of this post appears on the O’Reilly Radar.] The O’Reilly Data Show Podcast: Karthik Ramasamy on Heron, DistributedLog, and designing real-time applications. Subscribe to the O’Reilly Data Show Podcast to explore the opportunities and techniques driving big data, data science, and AI. Find us on Stitcher, TuneIn, iTunes, SoundCloud, RSS. In this episodeContinue reading “Architecting and building end-to-end streaming applications”

Structured streaming comes to Apache Spark 2.0

[A version of this post appears on the O’Reilly Radar.] The O’Reilly Data Show Podcast: Michael Armbrust on enabling users to perform streaming analytics, without having to reason about streaming. Subscribe to the O’Reilly Data Show Podcast to explore the opportunities and techniques driving big data, data science, and AI. Find us on Stitcher, TuneIn,Continue reading “Structured streaming comes to Apache Spark 2.0”

Stream processing and messaging systems for the IoT age

[A version of this post appears on the O’Reilly Radar.] The O’Reilly Data Show podcast: M.C. Srivas on streaming, enterprise grade systems, the Internet of Things, and data for social good. Subscribe to the O’Reilly Data Show Podcast to explore the opportunities and techniques driving big data and data science. Find us on Stitcher, TuneIn,Continue reading “Stream processing and messaging systems for the IoT age”

Building systems for massive scale data applications

The O’Reilly Data Show podcast: Tyler Akidau on the evolution of systems for bounded and unbounded data processing. [This piece was co-written by Shannon Cutt. A version of this post appears on the O’Reilly Radar.] Subscribe to the O’Reilly Data Show Podcast to explore the opportunities and techniques driving big data and data science. ManyContinue reading “Building systems for massive scale data applications”

Celebrating the real-time processing revival

[A version of this article appears on the O’Reilly Radar.] Register for Strata + Hadoop World NYC, which will take place September 29 to Oct 1, 2015. A few months ago, I noted the resurgence in interest in large-scale stream-processing tools and real-time applications. Interest remains strong, and if anything, I’ve noticed growth in theContinue reading “Celebrating the real-time processing revival”

Expanding options for mining streaming data

[A version of this post appears on the O’Reilly Data blog.] Stream processing was in the minds of a few people that I ran into over the past week. A combination of new systems, deployment tools, and enhancements to existing frameworks, are behind the recent chatter. Through a combination of simpler deployment tools, programming interfaces,Continue reading “Expanding options for mining streaming data”

Stream Mining essentials

[A version of this post appears on the O’Reilly Strata blog.] A series of open source, distributed stream processing frameworks have become essential components in many big data technology stacks. Apache Storm remains the most popular, but promising new tools like Spark Streaming and Apache Samza are going to have their share of users. TheseContinue reading “Stream Mining essentials”

Stream Processing and Mining just got more interesting

[A version of this post appears on the O’Reilly Strata blog.] Largely unknown outside data engineering circles, Apache Kafka is one of the more popular open source, distributed computing projects. Many data engineers I speak with either already use it or are planning to do so. It is a distributed message broker used to store1Continue reading “Stream Processing and Mining just got more interesting”

Near realtime, streaming, and perpetual analytics

[A version of this post appears on the O’Reilly Strata blog.] Simple example of a near realtime app built with Hadoop and HBase Over the past year Hadoop emerged from its batch processing roots and began to take on interactive and near realtime applications. There are numerous examples that fall under these categories, but oneContinue reading “Near realtime, streaming, and perpetual analytics”

Pattern-detection and Twitter’s Streaming API

[A version of this post appears on the O’Reilly Strata blog.] Researchers and companies who need social media data frequently turn to Twitter’s API to access a random sample of tweets. Those who can afford to pay (or have been granted access) use the more comprehensive feed (the firehose) available through a group of certifiedContinue reading “Pattern-detection and Twitter’s Streaming API”