[A version of this post appears on the O’Reilly Radar.]
How new developments in algorithms, machine learning, analytics, infrastructure, data ethics, and culture will shape data in 2018.
1. New tools will make graphs and time series easier, leading to new use cases
Graphs and time series have been a crucial part of the explosion in big data. 2018 will see the emergence of a new generation of tools for storing and analyzing graphs and time series at large scale. These new analytic and visualization tools will help product groups devise new offerings, especially for use cases in security and fraud detection.
2. More companies will join data partnerships to share data
In 2016, I started hearing companies express interest in data sharing platforms, and startups have now begun to build data exchanges to allow companies to share data across organizational boundaries, while protecting privacy and IP. Ideas from the blockchain world have inspired some of these initiatives, particularly crypto and distributed control. Data partnerships are taking hold in financial services companies, and I anticipate this trend to spread into other industries this year.
3. Expect advances in tools that facilitate ML experimentation and collaboration
We’re in an empirical era of machine learning. Companies are now building tools that facilitate experimentation and collaboration. There’s been a particular focus on data science platforms that allow users to share processing pipelines and features/predictors, use different libraries, and enable end-to-end reproducibility.
4. As well as in onboarding courses and tools for data scientists
As more companies add deep learning to the mix of algorithms they use, we’ll see new onboarding courses and tools that allow data scientists to share best practices, architectures, and parameters.
5. We’ll see new use cases for deep learning as a machine learning method
Besides traditional applications in computer vision, speech, and text, companies are actively exploring deep learning for recommenders, search ranking, fraud and anomaly detection, and time series forecasting.
6. Data pipelines that draw on multiple data sources will continue to evolve
Machine learning products require data pipelines that draw on disparate data sources, so data integration, data enrichment, and data processing tools continue to be critical.
7. Anticipate new methods for unifying live and historical data
In recent months, I’ve come across startups and open source communities building storage systems that enable analytics on live and long-term historical data. These unified data management systems enable analysts to build applications using a single system rather than querying live and historical data stores separately.
8. Companies will increasingly need new data cache and data fabric architectures
Because companies are using a variety of storage systems (distributed file systems, object stores, etc.) and cloud providers, we’ll see more architectures that rely on distributed memory systems or common data layers that sit between compute and storage.
9. Machine learning specialists will play an important role in companies
As companies deploy machine learning across many products and services, there’s an increased appreciation for specialists who can deploy and manage models in production (using new open source tools like Clipper). In 2017, we noted an emerging position—machine learning engineer—that LinkedIn recently pegged as one of the fastest growing emerging jobs.
10. So will machine learning models that manage machine learning
And as companies’ reliance on machine learning models grows and they begin deploying thousands of models, we’ll see them begin to use machine learning tools to augment their machine learning engineers.
11. Look for progress in data science libraries and frameworks
Python remains the favorite language of data scientists—although R is also very popular—and Spark is the most popular distributed computing framework. Software developers and hardware manufacturers will continue to build tools and accelerators to optimize these popular libraries and frameworks.
12. There will be more attention paid to fairness, transparency, and explainability
We’ve all heard stories of data products gone awry. Machine learning researchers are beginning to share best practices for building models that go well beyond simply optimizing a business metric. In 2018 I expect we’ll hear more conversations about addressing bias and other related problems in machine learning.
13. Along with an increased focus on privacy and security
As GDPR looms and security breaches continue to grab headlines, I suspect more companies will be inspired by Apple’s recent paper on integrating differential privacy into machine learning.
14. Executives must level up their data science and machine learning knowledge
Data and machine learning are becoming more pervasive—and a greater competitive differentiator. If they haven’t already, this year, it’s imperative that managers and decision makers learn about the technologies, methods, and best practices that will impact their industries, with compliance and ethics important components of their training.
15. Data science will continue to be more firmly integrated into enterprise management
We’ve moved past initial proof-of-concept projects and stand-alone data science teams. Companies have extended centers of excellence to include analytic projects and are working—and sometimes struggling—with how to manage data science in the enterprise.
16. Other applications and domains to watch in 2018
- Data journalism: Consortiums, media companies, and nonprofits routinely use data. Since 2018 is an election year in the US, expect this to intensify.
- Security: Cybersecurity companies are using machine learning, but so are hackers. Researchers continue to work on robust machine learning models to counter adversarial attacks.
- Financial services: Finance leads other industries when it comes to the use of large-scale graph analytics, alternative data sources, the blockchain, and crypto. Expect to hear more about how machine learning is used in financial trading and risk management.