Semi-regular field notes from the world of data:
Apache Spark development community: Josh Rosen of Databricks recently built a tool for browsing pull requests. I like that it lets you scan each of the major components (Spark SQL, Streaming, MLlib, etc.). Now that Spark has become one of the most active open source projects in big data, tools like this make it easier for outsiders to follow what Spark developers are up to. [Full disclosure: I’m an advisor to Databricks, a startup commercializing Apache Spark.]
Treato: As many of you know, I’m a fan of domain specific big data platforms. During a trip to Israel last May, I met with the CEO of Treato, an interesting platform focused on health care. By analyzing unstructured text from big (social) sites and small patient support groups, the company hopes to understand patients concerns and problems (the company aspires to be the “voice of patients”). This requires integrating multiple data sources and health databases, and NLP tools tuned for extracting health experiences on the web. 70% of Treato’s millions of users come from North America.
Gradient Boosting: You know a technique has arrived when startups prioritize implementing it! GraphLab and 0xdata recently released Gradient Boosted algorithms. Both companies will be speaking about their implementations at Strata NYC (here are descriptions of the Strata sessions of GraphLab and 0xdata).
Hardcore Data Science (HDS): I had drinks with my co-organizer (Ben Recht) last Friday and we’re both looking forward to HDS in Strata NYC. Ben’s work on HOGWILD! was mentioned prominently in a recent Wired article on a Microsoft Adam (a deep learning system, that relies on ideas from the HOGWILD! paper). If you’re a fan of distributed algorithms (like Microsoft Adam), you’ll want attend Ben’s presentation on machine learning pipelines at Strata. I also picked Ben’s brain on compressed sensing – another area where he’s made important contributions. I’ve long been fascinated with compressed sensing and I’m happy to have Anna Gilbert speak on it at HDS this October.