Big Data solutions through the combination of tools

[A version of this post appears on the O’Reilly Data blog and Forbes.]

As a user who tends to mix-and-match many different tools, not having to deal with configuring and assembling a suite of tools is a big win. So I’m really liking the recent trend towards more integrated and packaged solutions. A recent example is the relaunch of Cloudera’s Enterprise Data hub, to include Spark1 and Spark Streaming. Users benefit by gaining automatic access to analytic engines that come with Spark2. Besides simplifying things for data scientists and data engineers, easy access to analytic engines is critical for streamlining the creation of big data applications.

Another recent example is Dendrite3 – an interesting new graph analysis solution from Lab41. It combines Titan (a distributed graph database), GraphLab (for graph analytics), and a front-end that leverages AngularJS, into a Graph exploration and analysis tool for business analysts:

Smiley face

Users of Spark explore Spark Streaming because similar code for batch (Spark) can, with minor modification, be used for realtime (Spark Streaming) computations. Along these lines, Summingbird – an open source library from Twitter – offers something similar for Hadoop MapReduce and Storm. With Summingbird, programs that look like Scala collection transformations can be executed in batch (Scalding) or realtime (Storm).

In some instances the underlying techniques from a set of tools makes its way into others. The DeepDive team at Stanford just recently revamped their information extraction and natural language understanding system. But already techniques used in DeepDive have found their way into many other systems including MADlib, Cloudera Impala, “a product from Oracle”, and Google Brain.

Related content:


(1) Full disclosure: I am an advisor to Databricks – a startup commercializing Spark. (2) Some potential applications of Spark and Spark Streaming include stream processing and mining, interactive and iterative computing, machine-learning, and graph analytics. (3) Hat tip to Danny Bickson.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s