How to build analytic products in an age when data privacy has become critical

Privacy-preserving analytics is not only possible, but with GDPR about to come online, it will become necessary to incorporate privacy in your data products. In this post, I share slides and notes from a talk I gave in March 2018 at the Strata Data Conference in California, offering suggestions for how companies may want to buildContinue reading “How to build analytic products in an age when data privacy has become critical”

What machine learning engineers need to know

[A version of this post appears on the O’Reilly Radar.] The O’Reilly Data Show Podcast: Jesse Anderson and Paco Nathan on organizing data teams and next-generation messaging with Apache Pulsar. In this episode of the Data Show, I spoke Jesse Anderson, managing director of the Big Data Institute, and my colleague Paco Nathan, who recentlyContinue reading “What machine learning engineers need to know”

We need to build machine learning tools to augment machine learning engineers

We need to build machine learning tools to augment our machine learning engineers. In this post, I share slides and notes from a talk I gave in December 2017 at the Strata Data Conference in Singapore offering suggestions to companies that are actively deploying products infused with machine learning capabilities. Over the past few years,Continue reading “We need to build machine learning tools to augment machine learning engineers”

What lies ahead for data in 2018

[A version of this post appears on the O’Reilly Radar.] How new developments in algorithms, machine learning, analytics, infrastructure, data ethics, and culture will shape data in 2018. 1. New tools will make graphs and time series easier, leading to new use cases Graphs and time series have been a crucial part of the explosion in bigContinue reading “What lies ahead for data in 2018”

How machine learning will accelerate data management systems

[A version of this post appears on the O’Reilly Radar.] The O’Reilly Data Show Podcast: Tim Kraska on why ML will change how we build core algorithms and data structures. In this episode of the Data Show, I spoke with Tim Kraska, associate professor of computer science at MIT. To take advantage of big data,Continue reading “How machine learning will accelerate data management systems”

The current state of Apache Kafka

[A version of this post appears on the O’Reilly Radar.] The O’Reilly Data Show Podcast: Neha Narkhede on data integration, microservices, and Kafka’s roadmap. In this episode of the Data Show, I spoke with Neha Narkhede, co-founder and CTO of Confluent. As I noted in a recent post on “The Age of Machine Learning,” dataContinue reading “The current state of Apache Kafka”

Building a natural language processing library for Apache Spark

[A version of this post appears on the O’Reilly Radar.] The O’Reilly Data Show Podcast: David Talby on a new NLP library for Spark, and why model development starts after a model gets deployed to production. When I first discovered and started using Apache Spark, a majority of the use cases I used it forContinue reading “Building a natural language processing library for Apache Spark”

How companies can navigate the age of machine learning

To become a “machine learning company,” you need tools and processes to overcome challenges in data, engineering, and models. Over the last few years, the data community has focused on gathering and collecting data, building infrastructure for that purpose, and using data to improve decision-making. We are now seeing a surge in interest in advancedContinue reading “How companies can navigate the age of machine learning”

How Ray makes continuous learning accessible and easy to scale

[A version of this post appears on the O’Reilly Radar.] The O’Reilly Data Show Podcast: Robert Nishihara and Philipp Moritz on a new framework for reinforcement learning and AI applications. Subscribe to the O’Reilly Data Show Podcast to explore the opportunities and techniques driving big data, data science, and AI. Find us on Stitcher, TuneIn, iTunes, SoundCloud, RSS. In this episodeContinue reading “How Ray makes continuous learning accessible and easy to scale”

A scalable time-series database that supports SQL

[A version of this post appears on the O’Reilly Radar.] The O’Reilly Data Show Podcast: Michael Freedman on TimescaleDB and scaling SQL for time-series. Subscribe to the O’Reilly Data Show Podcast to explore the opportunities and techniques driving big data, data science, and AI. Find us on Stitcher, TuneIn, iTunes, SoundCloud, RSS. In this episodeContinue reading “A scalable time-series database that supports SQL”