Bits from the Data Store

Semi-regular field notes from the world of data: Alibaba ♥ Spark: Next time someone asks you if Apache Spark scales, point them to this recent post by Chinese e-commerce juggernaut Alibaba. What particularly caught my eye is the company’s heavy usage of GraphX, Spark’s library for graph analytics. [Full disclosure: I’m an advisor to Databricks,Continue reading “Bits from the Data Store”

Bits from the Data Store

Semi-regular field notes from the world of data (gathered from Scifoo 2014): Filtergraph and the power of visual exploration: A web-based tool for exploring high-dimensional data sets, Filtergraph came out of the lab of Astrophysicist Keivan Stassun. It has helped researchers make several interesting discoveries including a paper (that appeared in Nature) on a techniqueContinue reading “Bits from the Data Store”

Network Science Dashboards

Networks graphs can be used as primary visual objects with conventional charts used to supply detailed views [A version of this post appears on the O’Reilly Data blog.] With Network Science well on its way to being an established academic discipline, we’re beginning to see tools that leverage it. Applications that draw heavily from thisContinue reading “Network Science Dashboards”

2013 Revenue of some startup companies

The chart below is from Wikibon’s estimates1 of the 2013 revenue2 of some Big Data companies. Using d3 I drew a chart that shows 2013 revenue (in millions) from Big Data products and services, as well as the share of revenue derived from services, for a few select/startup companies: (Click HERE to enlarge) The BigContinue reading “2013 Revenue of some startup companies”

Graphs, Time-series, Dataviz, and Crowdsourcing at Strata Santa Clara 2014

There are many fantastic talks at Strata and it can be overwhelming to navigate the schedule. I plan to list talks I’m hoping to catch in a series of “time-turner” posts (check this blog on Wed/Thu at 10 a.m.). But for now let me highlight talks from a few categories: Graphs and Network Analysis: Large-scaleContinue reading “Graphs, Time-series, Dataviz, and Crowdsourcing at Strata Santa Clara 2014”

11 Essential Features that Visual Analysis Tools Should Have

[A version of this post appears on the O’Reilly Strata blog.] After recently playing with SAS Visual Analytics, I’ve been thinking about tools for visual analysis. By visual analysis I mean the type of analysis most recently popularized by Tableau, QlikView, and Spotfire: you encounter a data set for the first time, conduct exploratory dataContinue reading “11 Essential Features that Visual Analysis Tools Should Have”

2012 Revenue of some Big Data companies

The chart below is from Wikibon’s estimates1 of the 2012 revenue of some Big Data companies. Using d3 I drew a chart that shows 2012 revenue in millions, as well as the share of revenue derived from services, for a few select/startup companies:         (Click HERE to enlarge) The Big 3 Hadoop Vendors (Cloudera/MapR/Hortonworks): Combined revenueContinue reading “2012 Revenue of some Big Data companies”