The Data Integration Market
As much as I like talking and writing about machine learning and AI, the truth is that there are probably more impressive startups in the data engineering and data infrastructure (DE) category. DE companies address fundamentals that need to be in place before companies can rely on reports and metrics. Any organization wishing to scale their use of AI and machine learning also needs DE tools. In fact almost all the tools in the buzzy category of MLOps assume that users already have their DE act together.
To understand the data integration landscape I draw on the following sources: job postings, Linkedin profiles, and startup databases. This helps us gain a deeper understanding of the demand and supply sides of the data integration market, as well as the startups providing the next-generation solutions.
Data Exchange podcast
- An open source and end-to-end library for causal inference: Amit Sharma (Senior Researcher) and Emre Kiciman (Senior Principal Researcher) of Microsoft Research, are part of the team behind DoWhy, a new library for estimating causal effects based on historical data alone. I like what the DoWhy crew have built and I’m looking forward to using it to explore applications of causal inference and causal learning in future projects.
- AI Risk Management Framework: I discuss the new AI Risk Management Framework from the National Institute of Standards and Technology (NIST) with Elham Tabassi (of NIST) and Andrew Burt (Managing Partner of BNH.ai). In the cybersecurity realm, a host of businesses and cybersecurity leaders have adopted the NIST Cybersecurity Framework and many consider it to be the gold-standard in that field. Consequently, I believe that this new NIST initiative will have a significant impact on how we manage AI risks in the future.
- Data Science at Shopify: Wendy Foster, Director of Engineering & Data Science, explains in detail how they use data science and machine learning within Shopify.
Free Report: AI in Healthcare Survey Results
AI applications in healthcare present a number of challenges and considerations, and many of these same considerations and lessons also apply to other sectors. Topping the list of priorities for 2022: Data Integration and Language Models.
State-Of-The-Art AI Systems Are Trained With Extra Data
Stanford’s AI Index Report recently came out – one of my favorite annual reads. This year’s index highlights the need for additional training data in order to achieve state-of-the-art results across multiple technical benchmarks. In a short post, I discuss my favorite bits – including tools for detoxifying large language models – and I throw in bonus charts on the global talent pool for reinforcement learning and computer vision.
Read The Full Post
If you enjoyed this newsletter please support our work by encouraging your friends and colleagues to subscribe: