[A version of this post appears on the O’Reilly Radar.]
From infrastructure to tools to training, here’s what’s ahead for data.
Whether you’re a business leader or a practitioner, here are key data trends to watch and explore in the months ahead.
Increasing focus on building data culture, organization, and training
In a recent O’Reilly survey, we found that the skills gap remains one of the key challenges holding back the adoption of machine learning. The demand for data skills (“the sexiest job of the 21st century”) hasn’t dissipated. LinkedIn recently found that demand for data scientists in the US is “off the charts,” and our survey indicated that the demand for data scientists and data engineers is strong not just in the US but globally.
With the average shelf life of a skill today at less than five years and the cost to replace an employee estimated at between six and nine months of the position’s salary, there is increasing pressure on tech leaders to retain and upskill rather than replace their employees in order to keep data projects (such as machine learning implementations) on track. We are also seeing more training programs aimed at executives and decision makers, who need to understand how these new ML technologies can impact their current operations and products.
Beyond investments in narrowing the skills gap, companies are beginning to put processes in place for their data science projects, for example creating analytics centers of excellence that centralize capabilities and share best practices. Some companies are also actively maintaining a portfolio of use cases and opportunities for ML.
Cloud for data infrastructure
Cloud platforms will continue to draw companies that need to invest in data infrastructure: not only do the cloud platforms have improving foundational technologies and managed services, but increasingly software vendors and popular open source data projects are making sure their offerings are easy to run in the cloud. According to a recent O’Reilly survey, 85% of respondents said they already had some of their data infrastructure in the cloud, and other surveys of IT executives reveal that many are planning to increase their investments in SaaS and cloud tools. Data engineers and data scientists are beginning to use new cloud technologies, like serverless, for some of their tasks.
Continuing investments in (emerging) data technologies
For most companies, the road toward machine learning (ML) involves simpler analytic applications. This is good news because ML demands data, and many of the simpler analytic tools that precede ML already require data infrastructure to be in place. The growing interest in ML will spur companies to continue to invest in the foundational data technologies that are required to scale ML initiatives. This includes items like data ingestion and integration, storage and data processing, and data preparation and cleaning.
Tools for secure and privacy-preserving analytics
Companies will continue to invest in tools for data security and privacy, but we expect to see an increased focus on tools for privacy-preserving analytics—areas where researchers and startups have been actively engaged. Organizations will begin to identify and manage risks that accompany the use of machine learning in products and services, such as security and privacy, bias, safety, and lack of transparency.
Sustaining machine learning in an enterprise
Early indications are that many organizations are correctly focusing their initial machine learning projects (and investments) in use cases that improve their most mission-critical analysis projects. For example, financial service companies are investing ML in risk analysis, telecom companies are applying AI to service operations, and automotive companies are focusing their initial ML implementations in manufacturing. This is also reflected by the emergence of tools that are specific to machine learning, including data science platforms, data lineage, metadata management and analysis, data governance, and model lifecycle management.
Burgeoning IoT technologies
A few years ago, most internet of things (IoT) examples involved smart cities and smart governments. But the rise of cloud platforms, cheap sensors, and machine learning has IoT poised to make a comeback in industry. We’ll still hear about municipal and public sector applications, but there are other interesting use cases involving closed systems (factories, buildings, homes) and enterprise and consumer applications (edge computing).
Automation in data science and data
As the use of machine learning and analytics becomes more widespread, we need tools that will allow data scientists and data engineers to scale so they can tackle many more problems and maintain more systems. This will lead to more automation tools for the many stages involved in data science, including data preparation, feature engineering, model selection, and hyperparameter tuning, as well as data engineering and data operations. There are already some early applications of machine learningaimed at the partial automation of tasks in data science, software development, and IT operations.