Bits from the Data Store

Semi-regular field notes from the world of data:

  • Alibaba ♥ Spark: Next time someone asks you if Apache Spark scales, point them to this recent post by Chinese e-commerce juggernaut Alibaba. What particularly caught my eye is the company’s heavy usage of GraphX, Spark’s library for graph analytics.
    [Full disclosure: I’m an advisor to Databricks, a startup commercializing Apache Spark.]
  • yt: Magnetized White Dwarf Binary MergerVisual Exploration with yt: Having recently featured FilterGraph, I asked Physicists and Pydata luminaries Josh Bloom, Fernando Perez, and Brian Granger if they knew any other visualizations tools popular among astronomists. They all recommended yt. It has roots in astronomy but the gallery of examples indicates that scientists from many other domains use it too.
  • Narrative Recommendations: When NarrativeScience started out, I thought of it primarily as a platform for generating short, factual stories for (hyperlocal) news services (a newer startup OnlyBoth seems to be focused on this, their working example being the use of “box scores” to cover “college” teams). More recently NarrativeScience has aimed its technology at the lucrative Business Intelligence market. Starting from structured data, NarrativeScience extracts and ranks facts, and weaves a narrative arc that analysts consume. The company retains the traditional elements of BI tools (tables, charts, dashboards) and supplements it with narrative summaries and recommendations. I like the concept of adding narrative outputs, and as with all relatively new technologies, the algorithms and accompanying user interfaces are bound to get better over time. The technology is largely “language” agnostic, but to reap maximum benefit it does need to be tuned for the specific domain you want to use it in.

    With spreadsheets, you have to calculate. With visualizations, you have to interpret. With narratives, all you have to do is read.

    Narrative Science flow“Future” implementation of NarrativeScience
    Source: founder Kris Hammond’s slides at Cognitive Computing Forum 2014

  • Julia 0.3 has shipped: This promising language just keeps improving. A summer that started with JuliaCon, continued with a steady expansion of libraries, and ends with a major new release.

  • Upcoming Meetups: SF Bay Area residents can look forward to two interesting Spark meetups this coming week.

    Leave a Reply

    %d bloggers like this: