Hardcore Data Science, NYC 2015

Ben Recht and I hosted another great edition of Hardcore Data Science in NYC yesterday. From the very first talk, the room was full, the audience was attentive, and the energy in the room was high – and it remained that way throughout the day. A summary can be found below. Short detour: Stanford CSContinue reading “Hardcore Data Science, NYC 2015”

Celebrating the real-time processing revival

[A version of this article appears on the O’Reilly Radar.] Register for Strata + Hadoop World NYC, which will take place September 29 to Oct 1, 2015. A few months ago, I noted the resurgence in interest in large-scale stream-processing tools and real-time applications. Interest remains strong, and if anything, I’ve noticed growth in theContinue reading “Celebrating the real-time processing revival”

The tensor renaissance in data science

[A version of this post appears on the O’Reilly Radar.] The O’Reilly Data Show Podcast: Anima Anandkumar on tensor decomposition techniques for machine learning. After sitting in on UC Irvine Professor Anima Anandkumar’s Strata + Hadoop World 2015 in San Jose presentation, I wrote a post urging the data community to build tensor decomposition libraries forContinue reading “The tensor renaissance in data science”

More tools for managing and reproducing complex data projects

A survey of the landscape shows the types of tools remain the same, but interfaces continue to improve. [A version of this post appears on the O’Reilly Radar.] As data projects become complex and as data teams grow in size, individuals and organizations need tools to efficiently manage data projects. A while back, I wroteContinue reading “More tools for managing and reproducing complex data projects”

A real-time processing revival

[A version of this post appears on the O’Reilly Radar blog.] Things are moving fast in the stream processing world. There’s renewed interest in stream processing and analytics. I write this based on some data points (attendance in webcasts and conference sessions; a recent meetup), and many conversations with technologists, startup founders, and investors. Certainly,Continue reading “A real-time processing revival”

Let’s build open source tensor libraries for data science

[A version of this post appears on the O’Reilly Radar blog.] Tensor methods for machine learning are fast, accurate, and scalable, but we’ll need well-developed libraries. Data scientists frequently find themselves dealing with high-dimensional feature spaces. As an example, text mining usually involves vocabularies comprised of 10,000+ different words. Many analytic problems involve linear algebra,Continue reading “Let’s build open source tensor libraries for data science”

Time-turner: Strata San Jose 2015, day 2

[Our friends at Dato created an interesting content-based, Strata session recommender. Check it out here.] There are so many good talks happening at the same time that it’s impossible to not miss out on good sessions. But imagine I had a time-turner necklace and could actually “attend” 2 (maybe 3) sessions happening at the sameContinue reading “Time-turner: Strata San Jose 2015, day 2”

Time-turner: Strata San Jose 2015, day 1

[Our friends at Dato created an interesting content-based, Strata session recommender. Check it out here.] There are so many good talks happening at the same time that it’s impossible to not miss out on good sessions. But imagine I had a time-turner necklace and could actually “attend” 2 (maybe 3) sessions happening at the sameContinue reading “Time-turner: Strata San Jose 2015, day 1”

Hardcore Data Science: 2015 California

Ben Recht and I hosted another great edition of Hardcore Data Science yesterday. From the very first talk, the room was full, the audience was attentive, and the energy in the room was high. It remained that way throughout the day. This time around, I spent more time documenting the day on Twitter – enjoy!Continue reading “Hardcore Data Science: 2015 California”

Forecasting events, from disease outbreaks to sales to cancer research

[A version of this post appears on the O’Reilly Radar blog.] The O’Reilly Data Show Podcast: Kira Radinsky on predicting events using machine learning, NLP, and semantic analysis. Editor’s note: One of the more popular speakers at Strata + Hadoop World, Kira Radinsky was recently profiled in the new O’Reilly Radar report, Women in Data:Continue reading “Forecasting events, from disease outbreaks to sales to cancer research”