6 reasons why I like KeystoneML

[A version of this article appears on the O’Reilly Radar.] The O’Reilly Data Show Podcast: Ben Recht on optimization, compressed sensing, and large-scale machine learning pipelines. As we put the finishing touches on what promises to be another outstanding Hardcore Data Science Day at Strata + Hadoop World in New York, I sat down withContinue reading “6 reasons why I like KeystoneML”

Why data preparation frameworks rely on human-in-the-loop systems

[A version of this article appears on the O’Reilly Radar.] As I’ve written in previous posts, data preparation and data enrichment are exciting areas for entrepreneurs, investors, and researchers. Startups like Trifacta, Tamr, Paxata, Alteryx, and CrowdFlower continue to innovate and attract enterprise customers. I’ve also noticed that companies — that don’t specialize in theseContinue reading “Why data preparation frameworks rely on human-in-the-loop systems”

Fireside chat with Ben Horowitz

I had the pleasure of interviewing Ben Horowitz on the main stage at the recent Spark summit in SFO. Ben is co-founder of one of the leading tech venture capital firms a16z, and author of one of my favorite books about entrepreneurship (“The Hard Thing About Hard Things”). The Spark Summit had a packed lineup,Continue reading “Fireside chat with Ben Horowitz”

Large-scale Data Science and Machine Learning with Spark

[Full disclosure: I’m an advisor to Databricks.] At last year’s Spark Summit in SF, Ali Ghodsi gave the first public demo of Databricks Cloud and Workspace. As I noted at the time, it was a showstopper! This year Ali gave an update and while I wasn’t on hand to see it in person, judging fromContinue reading “Large-scale Data Science and Machine Learning with Spark”

Apache Spark: Powering applications on-premise and in the cloud

[A version of this post appears on the O’Reilly Radar.] As organizations shift their focus toward building analytic applications, many are relying on components from the Apache Spark ecosystem. I began pointing this out in advance of the first Spark Summit in 2013 and since then, Spark adoption has exploded. With Spark Summit SF rightContinue reading “Apache Spark: Powering applications on-premise and in the cloud”

Data science makes an impact on Wall Street

[A version of this article appears on the O’Reilly Radar.] Having started my career in industry, working on problems in finance, I’ve always appreciated how challenging it is to build consistently profitable systems in this extremely competitive domain. When I served as quant at a hedge fund in the late 1990s and early 2000s, IContinue reading “Data science makes an impact on Wall Street”

Israel conference on Big Data, Analytics and Machine Learning

The first Big Data, Analytics and Machine Learning (Israel Innovation) conference was a resounding success. Kudos to the organizers Danny Bickson, Assaf Araki, and Avner Algom. I was happy to help them invite speakers, publicize the event, and give the opening keynote. The conference was sold out and I heard a lot of ethusiastic feedbackContinue reading “Israel conference on Big Data, Analytics and Machine Learning”

The tensor renaissance in data science

[A version of this post appears on the O’Reilly Radar.] The O’Reilly Data Show Podcast: Anima Anandkumar on tensor decomposition techniques for machine learning. After sitting in on UC Irvine Professor Anima Anandkumar’s Strata + Hadoop World 2015 in San Jose presentation, I wrote a post urging the data community to build tensor decomposition libraries forContinue reading “The tensor renaissance in data science”

More tools for managing and reproducing complex data projects

A survey of the landscape shows the types of tools remain the same, but interfaces continue to improve. [A version of this post appears on the O’Reilly Radar.] As data projects become complex and as data teams grow in size, individuals and organizations need tools to efficiently manage data projects. A while back, I wroteContinue reading “More tools for managing and reproducing complex data projects”