Semi-regular field notes from the world of data (gathered from Scifoo 2014):
Filtergraph and the power of visual exploration: A web-based tool for exploring high-dimensional data sets, Filtergraph came out of the lab of Astrophysicist Keivan Stassun. It has helped researchers make several interesting discoveries including a paper (that appeared in Nature) on a technique that improves estimates for the sizes of hundreds of exoplanets. For this particular discovery, Keivan tasked one of his students to play around with Filtergraph until she discovered “interesting patterns”. Her visual exploration led to an image that inspired the discoveries contained in the Nature paper.
RunMyCode: I was glad to see several sessions on the important topic of reproducibility of research projects and results (I’ve written about this topic from the data science perspective here and here). Beyond just sharing data sets, RunMyCode lets researchers share the data and computer programs they used to generate the results contained in their papers. Sharing both data and code used in research papers are important steps. (For complex setups, a tool like Vagrant can come in handy.) But to address the file drawer problem, access to data/code for “negative results” is also needed.
A network framework of cultural history: Scifoo alum Maximilian Schich pointed me to some of his group’s recent work on cultural migration in the Western world. I’ve seen Maximillian give preliminary talks on these results in the past (at Scifoo). He combines meticulous data collection, stunning visualizations, and network science to discover and quantify cultural patterns.
Fact-checking a Beautiful Mind: John Nash’s embedding theorem opened up lines of research in geometry and partial differential equations. Most mathematicians regard the embedding theorem as more impressive than Nash’s work on game theory (for which he was awarded the Nobel Prize in economics). Scifoo camper Steve Hsu pointed me to a not so well-known fact: in 1998 (42 years after the embedding theorem was published), eminent set-theorist Robert Solovay found an error in Nash’s paper! Nash observed that fixing his original paper was unnecessary as later work by others superseded his approach.
Instruction Sets Should Be Free (The Case For RISC-V): I received this preprint (blog post) from Dave Patterson – one of pioneers behind the RISC processor and RAID. Just as open interfaces like TCP/IP and software like Linux have been huge successes, Dave and fellow ASPIRE Lab founder, Krste Asanovic, are trying to rally hardware folks around the concept of having a free, open instruction set architecture (ISA).