Interactive Big Data analysis using approximate answers

[A version of this post appears on the O’Reilly Strata blog.] Interactive query analysis for (Hadoop scale data) has recently attracted the attention of many companies and open source developers – some examples include Cloudera’s Impala, Shark, Pivotal’s HAWQ, Hadapt, CitusDB, Phoenix, Sqrrl, Redshift, and BigQuery. These solutions use distributed computing, and a combination ofContinue reading “Interactive Big Data analysis using approximate answers”

Big Data and Advertising: In the trenches

[A version of this post appears on the O’Reilly Strata blog.] The $35B merger of Omnicom and Publicis put the convergence of Big Data and Advertising1 in the front pages of business publications. Adtech2 companies have long been at the forefront of many data technologies, strategies, and techniques. By now it’s well-known that many impressiveContinue reading “Big Data and Advertising: In the trenches”

Near realtime, streaming, and perpetual analytics

[A version of this post appears on the O’Reilly Strata blog.] Simple example of a near realtime app built with Hadoop and HBase Over the past year Hadoop emerged from its batch processing roots and began to take on interactive and near realtime applications. There are numerous examples that fall under these categories, but oneContinue reading “Near realtime, streaming, and perpetual analytics”

Moving from Batch to Continuous Computing at Yahoo!

[A version of this post appeared on the O’Reilly Strata blog.] My favorite session at the recent Hadoop Summit was a keynote by Bruno Fernandez-Ruiz, Senior Fellow & VP Platforms at Yahoo! He gave a nice overview of their analytic and data processing stack, and shared some interesting factoids about the scale of their bigContinue reading “Moving from Batch to Continuous Computing at Yahoo!”