How to build analytic products in an age when data privacy has become critical

Privacy-preserving analytics is not only possible, but with GDPR about to come online, it will become necessary to incorporate privacy in your data products. In this post, I share slides and notes from a talk I gave in March 2018 at the Strata Data Conference in California, offering suggestions for how companies may want to buildContinue reading “How to build analytic products in an age when data privacy has become critical”

Teaching and implementing data science and AI in the enterprise

[A version of this post appears on the O’Reilly Radar.] The O’Reilly Data Show Podcast: Jerry Overton on organizing data teams, agile experimentation, and the importance of ethics in data science. In this episode of the Data Show, I spoke with Jerry Overton, senior principal and distinguished technologist at DXC Technology. I wanted the perspectiveContinue reading “Teaching and implementing data science and AI in the enterprise”

The importance of transparency and user control in machine learning

[A version of this post appears on the O’Reilly Radar.] The O’Reilly Data Show Podcast: Guillaume Chaslot on bias and extremism in content recommendations. In this episode of the Data Show, I spoke with Guillaume Chaslot, an ex-YouTube engineer and founder of AlgoTransparency, an organization dedicated to helping the public understand the profound impact algorithms have on ourContinue reading “The importance of transparency and user control in machine learning”

Graphs as the front end for machine learning

[A version of this post appears on the O’Reilly Radar.] The O’Reilly Data Show Podcast: Leo Meyerovich on building large-scale, interactive applications that enable visual investigations. In this episode of the Data Show, I spoke with Leo Meyerovich, co-founder and CEO of Graphistry. Graphs have always been part of the big data revolution (think ofContinue reading “Graphs as the front end for machine learning”

How machine learning can be used to write more secure computer programs

[A version of this post appears on the O’Reilly Radar.] The O’Reilly Data Show Podcast: Fabian Yamaguchi on the potential of using large-scale analytics on graph representations of code. In this episode of the Data Show, I spoke with Fabian Yamaguchi, chief scientist at ShiftLeft. His 2015 Ph.D. dissertation sketched out how the combination ofContinue reading “How machine learning can be used to write more secure computer programs”

We need to build machine learning tools to augment machine learning engineers

We need to build machine learning tools to augment our machine learning engineers. In this post, I share slides and notes from a talk I gave in December 2017 at the Strata Data Conference in Singapore offering suggestions to companies that are actively deploying products infused with machine learning capabilities. Over the past few years,Continue reading “We need to build machine learning tools to augment machine learning engineers”

What lies ahead for data in 2018

[A version of this post appears on the O’Reilly Radar.] How new developments in algorithms, machine learning, analytics, infrastructure, data ethics, and culture will shape data in 2018. 1. New tools will make graphs and time series easier, leading to new use cases Graphs and time series have been a crucial part of the explosion in bigContinue reading “What lies ahead for data in 2018”

Machine learning at Spotify: You are what you stream

[A version of this post appears on the O’Reilly Radar.] The O’Reilly Data Show Podcast: Christine Hung on using data to drive digital transformation and recommenders that increase user engagement. In this episode of the Data Show, I spoke with Christine Hung, head of data solutions at Spotify. Prior to joining Spotify, she led dataContinue reading “Machine learning at Spotify: You are what you stream”

Building a natural language processing library for Apache Spark

[A version of this post appears on the O’Reilly Radar.] The O’Reilly Data Show Podcast: David Talby on a new NLP library for Spark, and why model development starts after a model gets deployed to production. When I first discovered and started using Apache Spark, a majority of the use cases I used it forContinue reading “Building a natural language processing library for Apache Spark”

Machine intelligence for content distribution, logistics, smarter cities, and more

[A version of this post appears on the O’Reilly Radar.] The O’Reilly Data Show Podcast: Rhea Liu on technology trends in China. In this episode of the Data Show, I spoke with Rhea Liu, analyst at China Tech Insights, a new research firm that is part of Tencent’s Online Media Group. If there’s one placeContinue reading “Machine intelligence for content distribution, logistics, smarter cities, and more”