Companies that are able to use data securely will be well-positioned to build data and AI applications in the future.
By Assaf Araki and Ben Lorica.
The use of data within companies continues to grow exponentially. This comes at a time when data platforms and tools for analytics, data science, and AI continue to get simpler. As a result the number of data users and data applications are growing within organizations.
This growth in data usage comes at a time of heightened concern for data security and privacy. On the cybersecurity front, data breaches are at an alltime high. In addition to data breaches, users have different expectations for the security and privacy of the information they generate or share. There have also been increasing demands from regulators. Since the GDPR and the CCPA were implemented in 2018, companies have been forced to adhere to many more privacy regulations.
A comprehensive data privacy and security policy involves protecting the confidentiality and integrity of data in any of these three states: at rest, in use, and in transit. In this post we describe the ecosystem of tools focused on protecting data while in use. Our primary focus is on Confidential Computing tools for the development of data, analytic, and AI applications. We believe that companies that are able to use data securely will be well-positioned to build data and AI applications in the future.
Safeguarding data while it’s being used is particularly challenging because most applications need to have data in the clear – unencrypted or otherwise protected – in order to compute. The field of Confidential Computing encompasses tools and techniques such as hardware, cryptography, algorithms, and machine learning:
The following are examples of some of these technologies in real-world settings:
- Differential Privacy: By using cross-referenced commercial data sets and voter registration lists, it is possible to trace even purportedly anonymous information back to an individual. In order to solve this conundrum for the 2020 census, the US Census Bureau has been developing a new privacy protection system built around differential privacy.
- TEE: The popular messaging app, Signal, has been exploring the use of TEEs for private cloud storage. The Royal Bank of Canada built a data analytics platform using TEEs on Microsoft Azure.
- HE: There have been early applications of Homomorphic Encryption in fields like genomics and healthcare. AWS recently published results of a “privacy-preserving XGBoost prediction algorithm” implemented on SageMaker and which uses HE.
- FL: In 2020 researchers from the University of Pennsylvania and Intel Labs announced a medical imaging model that uses Federated Learning that has the potential to “help clinicians better identify and treat brain tumors”.
- FL + Other technologies: Many of these technologies are used in combination with each other. Apple and Google have described how they’ve deployed Confidential Computing systems on mobile devices that involve differential privacy and federated learning. MIT researchers have explored the use of TEE and Federated Learning to improve models for patient care and cures without exposing or moving any data.
- Synthetic data: These technologies can be used to create and share realistic data freely across teams and organizations. The startup Synthesis AI hopes to help companies alleviate ethical AI issues related to bias and privacy by offering high-quality synthetic data for a broad range of use cases.
These are active research areas and there are numerous books and papers on each of these technologies. From 2016-2020 the total number of papers on the popular preprint sharing site, Arxiv, grew 166%. Over that same period the number of papers on Arxiv that contained the phrases FL, DP, and HE grew several orders of magnitude faster:
Evaluating Confidential Computing Solutions
There are a few important considerations to keep in mind when evaluating Confidential computing solutions. First, some of these tools – notably differential privacy and secure multi-party computation – are supporting components that tend to be used in conjunction with other technologies. Secondly, the performance and readiness of each of these technologies very much depends on the specific workload. As UC Berkeley Professor & Co-Founder of Opaque Systems Raluca Popa observed in a recent essay comparing TEE with the combination of HE and MPC:
- ❛It turns out that there is no clear winner between these two approaches: there is a tradeoff between them in terms of security guarantees, performance, and deployment, which I am explaining below. For simple computations, both approaches tend to be efficient, so the choice between these two would be based on deployment and security considerations. However, for complex workloads such as machine learning training and SQL analytics, the purely cryptographic approach is far too inefficient for many real-world deployments; the hardware security approach is currently the only practical approach of the two for such workloads.❜
A persistent complaint against Homomorphic Encryption is that it can be four to five orders of magnitude slower than computing on unencrypted data. Alon Kaufman, CEO and Co-Founder, Duality Technologies, a startup commercializing HE for data science and advanced analytics, recently noted that they are beginning to see promising results for specific types of batch-oriented use cases:
- ❛We have already achieved the most impressive Fully Homomorphic Encryption (FHE) acceleration results. Duality and Intel share the objective of accelerating FHE and its applications by orders of magnitude, enabling organizations around the world to benefit from secure and privacy enhanced data collaboration.❜
In the process of architecting and deploying technologies for confidential computing, you will need to choose providers you can trust. In the diagram below, the lower levels have fewer providers, making it easier to recover from any security or data breaches.
Use Cases and Current Ecosystem
Here are some of the key uses cases that we are seeing in the market today, along with a representative sample of companies and solutions:
- Secure computation in the Cloud: The goal is to enable organizations to migrate their sensitive data and models to the cloud. By implementing a comprehensive solution, you can unlock the benefits of cloud computing while also guaranteeing data privacy, trust, and compliance. Organizations can upload encrypted data to the cloud and apply advanced analytics models or machine learning directly to the encrypted data. (Relevant technologies: TEE, HE)
- Secure Collaboration and Data Sharing (inter organizational): Multiple data owners work together to analyze their collective data, with the assumption that the whole is greater than the parts. Financial services are early adopters for Anti Money Laundering, Fraud Detection, Credit Risk. Healthcare organizations are also early users. (Relevant technologies: TEE, HE, FL, DP)
- Ease of Compliance (intra organizational): The goal is to be able leverage data across departments, business units, or geographic regions, and to extract insight from data, while remaining in compliance with data privacy regulations. (Relevant technologies: TEE, HE, FL, DP)
- Data Services for Products that guarantee privacy: Confidential computing solutions that enable suppliers to expand data services for their products without revealing customer information. (Relevant technologies: TEE, HE, FL, DP)
- Protect and utilize data processed at the edge: The startup Edgify trains computer vision models directly on simple edge devices without needing to transfer data to the cloud. For example when their technology is deployed on medical devices, AI models can start identifying pathologies while ensuring that no personal data leaves the devices.
- Data Clean Rooms: A data clean room (DCR) is software that allows advertisers and brands to match user-level data without revealing any Personally Identifiable Information (PII). Ad platforms like Facebook, Amazon, and Google use data clean rooms to provide advertisers with matched data about the performance of their ads. Another recent example is Snowflake’s newly launched Media Data Cloud. DCRs are particularly relevant with the phasing out of third-party cookies.
Which companies and sectors will be first to adopt Confidential Computing technologies? Every industry has data, but sectors differ according to level of regulatory oversight and level of maturity with regards to extracting insights from data. Privacy regulations like GDPR and CCPA apply across the board regardless of sector. However, the sensitivity of data varies across sectors, and some sectors have more sensitive data than others, and therefore are subject to additional regulations.
The early adopters of Confidential Computing will come from highly regulated sectors like financial services and healthcare. Highly regulated sectors will use Confidential Computing to enable cloud usage, collaboration, and compliance. Industries that are more mature as far as extracting insights from data will also be prime candidates for Confidential Computing solutions. Other trends such as the newly introduced privacy policies by major mobile platforms will lead to a spike in interest in data exchanges, data clean rooms, and other related tools.
Synthetic Data as a tool for Confidential Computing will be used across industries. For now, other Confidential Computing tools like HE, TEE, FL still require more technical expertise and thus we believe adoption of such technologies will be more limited. In contrast, Synthetic Data is simpler to use and deploy, and offers benefits such as faster and cheaper data acquisition, which can potentially speed up the development of machine learning models and lead to more accurate predictions.
At the dawn of deep learning a decade ago, it was hard to envision that deep learning would impact every aspect of our life. We believe Confidential Computing is also a fundamental technology whose impact will cut across a wide range of industries and use cases.
Related Content: Other posts by Assaf Araki and Ben Lorica.
- Data Management Trends You Need to Know
- Taking Low-Code and No-Code Development to the Next Level
- An Enterprise Software Roadmap for Sky Computing
- What is DataOps?
- The Growing Importance of Metadata Management Systems
- AI and Automation meet BI
- Demystifying AI Infrastructure
- Software 2.0 takes shape
Assaf Araki is an investment manager at Intel Capital. His contributions to this post are his personal opinion and do not represent the opinion of the Intel Corporation. Intel Capital is an investor in Agita Labs, Duality, Fortanix, and Opaque. #IamIntel
Subscribe to the Gradient Flow Newsletter: