Data Analysis: Just one component of the Data Science workflow

[A version of this post appears on the O’Reilly Strata blog.] Judging from articles in the popular press the term data scientist has increasingly come to refer to someone who specializes in data analysis (statistics, machine-learning, etc.). This is unfortunate since the term originally described someone who could cut across disciplines. Far from being confinedContinue reading “Data Analysis: Just one component of the Data Science workflow”

Data analysis tools target non-experts

[A version of this post appears on the O’Reilly Strata blog.] A new set of tools make it easier to do a variety of data analysis tasks. Some require no programming, while other tools make it easier to combine code, visuals, and text in the same workflow. They enable users who aren’t statisticians or dataContinue reading “Data analysis tools target non-experts”

Interactive Big Data analysis using approximate answers

[A version of this post appears on the O’Reilly Strata blog.] Interactive query analysis for (Hadoop scale data) has recently attracted the attention of many companies and open source developers – some examples include Cloudera’s Impala, Shark, Pivotal’s HAWQ, Hadapt, CitusDB, Phoenix, Sqrrl, Redshift, and BigQuery. These solutions use distributed computing, and a combination ofContinue reading “Interactive Big Data analysis using approximate answers”

Surfacing anomalies and patterns in Machine Data

[A version of this post appears on the O’Reilly Strata blog.] I’ve been noticing that many interesting big data systems are coming out of IT operations. These are systems that go beyond the standard “capture/measure, display charts, and send alerts”. IT operations has long been a source of many interesting big data1 problems and IContinue reading “Surfacing anomalies and patterns in Machine Data”

Big Data and Advertising: In the trenches

[A version of this post appears on the O’Reilly Strata blog.] The $35B merger of Omnicom and Publicis put the convergence of Big Data and Advertising1 in the front pages of business publications. Adtech2 companies have long been at the forefront of many data technologies, strategies, and techniques. By now it’s well-known that many impressiveContinue reading “Big Data and Advertising: In the trenches”

Data scientists tackle the analytic lifecycle

[A version of this post appears on the O’Reilly Strata blog.] What happens after data scientists build analytic models? Model deployment, monitoring, and maintenance are topics that haven’t received as much attention in the past, but I’ve been hearing more about these subjects from data scientists and software developers. I remember the days when itContinue reading “Data scientists tackle the analytic lifecycle”

Pattern-detection and Twitter’s Streaming API

[A version of this post appears on the O’Reilly Strata blog.] Researchers and companies who need social media data frequently turn to Twitter’s API to access a random sample of tweets. Those who can afford to pay (or have been granted access) use the more comprehensive feed (the firehose) available through a group of certifiedContinue reading “Pattern-detection and Twitter’s Streaming API”

HBase looks more appealing to data scientists

[A version of this post appears on the O’Reilly Strata blog.] When Hadoop users need to develop apps that are “latency sensitive”, many of them turn to HBase1. Its tight integration with Hadoop makes it a popular data store for real-time applications. When I attended the first HBase conference last year, I was pleasantly surprisedContinue reading “HBase looks more appealing to data scientists”

How signals, geometry, and topology are influencing data science

[A version of this post appears on the O’Reilly Strata blog.] I’ve been noticing unlikely areas of mathematics pop-up in data analysis. While signal processing is a natural fit, topology, differential and algebraic geometry aren’t exactly areas you associate with data science. But upon further reflection perhaps it shouldn’t be so surprising that areas thatContinue reading “How signals, geometry, and topology are influencing data science”

Improving options for unlocking your graph data

[A version of this post appears on the O’Reilly Strata blog.] The popular open source project GraphLab received a major boost early this week when a new company comprised of its founding developers, raised funding to develop analytic tools for graph data sets. GraphLab Inc. will continue to use the open source GraphLab to “pushContinue reading “Improving options for unlocking your graph data”