Bits from the Data Store

Semi-regular field notes from the world of data: Tucked away in the community room at the recent GraphLab conference, I took a few people to a demo by Graphistry, a startup that lets users visually interact and analyze massive amounts of data. In particular their technology can handle and draw many more points than d3.jsContinue reading “Bits from the Data Store”

There are many use cases for graph databases and analytics

Business users are becoming more comfortable with graph analytics [A version of this post appears on the O’Reilly Radar blog.] The rise of sensors and connected devices will lead to applications that draw from network/graph data management and analytics. As the number of devices surpasses the number of people — Cisco estimates 50 billion connectedContinue reading “There are many use cases for graph databases and analytics”

Don’t miss the keynotes at the 2014 Spark Summit

There will be major announcements – particularly during the Monday morning keynotes. Fortunately the organizers will livestream the talks (sign up here). An added bonus if you sign-up for the livestream: I’ll be interviewing (keynote) speakers and key members of the Spark community throughout the first two days of the summit.

Bits from the Data Store

Semi-regular field notes from the world of data: I’m always on the lookout for interesting tools and ideas for reproducing and collaborating on long data workflows. Reproducibility and collaboration are topics that we’re following closely at Strata (both topics remain on the radar of many data scientists and data engineers I speak with). At theContinue reading “Bits from the Data Store”

Time-turner: Strata Santa Clara 2014, day 2

There are so many good talks happening at the same time that it’s impossible to not miss out on good sessions. But imagine I had a time-turner necklace and could actually “attend” 2 (maybe 3) sessions happening at the same time. Taking into account my current personal interests and tastes, here’s how my day wouldContinue reading “Time-turner: Strata Santa Clara 2014, day 2”

Time-turner: Strata Santa Clara 2014, day 1

There are so many good talks happening at the same time that it’s impossible to not miss out on good sessions. But imagine I had a time-turner necklace and could actually “attend” 2 (maybe 3) sessions happening at the same time. Taking into account my current personal interests and tastes, here’s how my day wouldContinue reading “Time-turner: Strata Santa Clara 2014, day 1”

Graphs, Time-series, Dataviz, and Crowdsourcing at Strata Santa Clara 2014

There are many fantastic talks at Strata and it can be overwhelming to navigate the schedule. I plan to list talks I’m hoping to catch in a series of “time-turner” posts (check this blog on Wed/Thu at 10 a.m.). But for now let me highlight talks from a few categories: Graphs and Network Analysis: Large-scaleContinue reading “Graphs, Time-series, Dataviz, and Crowdsourcing at Strata Santa Clara 2014”

What I use for data visualization

[A version of this post appears on the O’Reilly Data blog.] Depending on the nature of the problem, data size, and deliverable, I still draw upon an array of tools for data visualization. As I survey the Design track at next month’s Strata conference, I see creators and power users of visualization tools that manyContinue reading “What I use for data visualization”

HBase looks more appealing to data scientists

[A version of this post appears on the O’Reilly Strata blog.] When Hadoop users need to develop apps that are “latency sensitive”, many of them turn to HBase1. Its tight integration with Hadoop makes it a popular data store for real-time applications. When I attended the first HBase conference last year, I was pleasantly surprisedContinue reading “HBase looks more appealing to data scientists”

Python data tools just keep getting better

[A version of this post appeared on the O’Reilly Strata blog.] Here are a few observations inspired by conversations I had during the just concluded PyData conference1. The Python data community is well-organized: Besides conferences (PyData, SciPy, EuroSciPy), there is a new non-profit (NumFOCUS) dedicated to supporting scientific computing and data analytics projects. The listContinue reading “Python data tools just keep getting better”