Semi-regular field notes from the world of data (gathered from Scifoo 2014): Filtergraph and the power of visual exploration: A web-based tool for exploring high-dimensional data sets, Filtergraph came out of the lab of Astrophysicist Keivan Stassun. It has helped researchers make several interesting discoveries including a paper (that appeared in Nature) on a techniqueContinue reading “Bits from the Data Store”
Category Archives: Data Science
Scaling up Data Frames
New frameworks for interactive business analysis and advanced analytics fuel the rise in tabular data objects [A version of this post appears on the O’Reilly Radar blog.] Long before the advent of “big data”, analysts were building models using tools like R (and its forerunners S/S-PLUS). Productivity hinged on tools that made data wrangling, dataContinue reading “Scaling up Data Frames”
What’s New in Scikit-learn 0.15
Python has emerged as one of the more popular languages for doing data science. The primary reason is the impressive array of tools (the “Pydata” stack) available for addressing many stages of data science pipelines. One of the most popular Pydata tools is scikit-learn, an easy-to-use and highly-efficient machine learning library. I’ve written about whyContinue reading “What’s New in Scikit-learn 0.15”
Bits from the Data Store
Semi-regular field notes from the world of data: Tucked away in the community room at the recent GraphLab conference, I took a few people to a demo by Graphistry, a startup that lets users visually interact and analyze massive amounts of data. In particular their technology can handle and draw many more points than d3.jsContinue reading “Bits from the Data Store”
Deep Learning for Hackers
How do you get started using Deep Learning? In a previous post, I noted how many of the tools and best practices are locked away in “oral traditions” shared among practitioners. But recently, open source tools have made Deep Learning somewhat more accessible to hackers. In an upcoming webcast, I’m hosting noted hacker and startupContinue reading “Deep Learning for Hackers”
PredictionIO: an open source machine learning server
PredictionIO a startup that produces an open source machine learning server, has raised a seed round of $2.5M. The company’s engine allows developers to quickly integrate machine learning into products and services. The company’s machine learning server is open source, and is available on Amazon Web Services. As an open source package, the company hopesContinue reading “PredictionIO: an open source machine learning server”
Databricks Cloud makes it easier to build Data Products
Here is a link to Ali Ghodsi’s talk and demo that took the Spark Summit by storm. The demo really captures the power of Databricks Cloud: complex, high-performance, big data analytics at massive scale, accessible to anyone who can write simple scripts (currently supports SQL, Python, Scala). The demo culminates when Ali shows how easyContinue reading “Databricks Cloud makes it easier to build Data Products”
There are many use cases for graph databases and analytics
Business users are becoming more comfortable with graph analytics [A version of this post appears on the O’Reilly Radar blog.] The rise of sensors and connected devices will lead to applications that draw from network/graph data management and analytics. As the number of devices surpasses the number of people — Cisco estimates 50 billion connectedContinue reading “There are many use cases for graph databases and analytics”
Scalable Data Science on a Laptop
I’ll be hosting a webcast featuring one of Strata’s most popular speakers: machine-learning expert, Alice Zheng Here is what data science looks like today: 1. Munge some data: a. Process raw data. Stuff it into a database. b. Query for specific data. Coax results out through a straw. c. Munge data into a format requiredContinue reading “Scalable Data Science on a Laptop”
Streamlining Feature Engineering
Researchers and startups are building tools that enable feature discovery [A version of this post appears on the O’Reilly Data blog.] Why do data scientists spend so much time on data wrangling and data preparation? In many cases it’s because they want access to the best variables with which to build their models. These variablesContinue reading “Streamlining Feature Engineering”
