[A version of this post appears on the O’Reilly Radar.]
Privacy-preserving analytics is not only possible, but with GDPR about to come online, it will become necessary to incorporate privacy in your data products.
In this post, I share slides and notes from a talk I gave in March 2018 at the Strata Data Conference in California, offering suggestions for how companies may want to build analytic products in an age when data privacy has become critical. A lot has changed since I gave this presentation: numerous articles have been written about Facebook’s privacy policies, its CEO testified twice before the U.S. Congress, and I deactivated my mostly dormant Facebook account. The end result being that there’s even a more heightened awareness around data privacy, and people are acknowledging that problems go beyond a few companies or a few people.
Let me start by listing a few observations regarding data privacy:
- We tend to talk about data privacy in the context of security breaches, but there are many instances when privacy violations involve people who have been granted access to data.
- The growing number of connected devices enabled to collect data means our most sensitive data—see this article on smart homes—are being gathered and monetized.
- Concerns about the use of data privacy cuts across cultures. As someone who travels to China, I can attest that users there are just as concerned about how companies are using their data.
- It is true that regulators across the world are approaching data privacy in different ways. To the extent that many companies conduct business in the EU, the upcoming General Data Protection Regulation (GDPR) will influence how organizations across the world build and design data services and products.
Which brings me to the main topic of this presentation: how do we build analytic services and products in an age when data privacy has emerged as an important issue? Architecting and building data platforms is central to what many of us do. We have long recognized that data security and data privacy are required features for our data platforms, but how do we “lock down” analytics?
Once we have data securely in place, we proceed to utilize it in two main ways: (1) to make better decisions (BI) and (2) to enable some form of automation (ML). It turns out there are some new tools for building analytic products that preserve privacy. Let me give a quick overview of a few things you may want to try today.
Continue reading “How to build analytic products in an age when data privacy has become critical”