Speech Data Processing Takes Flight

Subscribe • Previous Issues Unlocking speech and audio data with new open source tools Interest in neural networks and deep learning can be traced back to groundbreaking results in computer vision (2012) and speech recognition (2011). The number of companies working on computer vision applications is increasing, but the number of companies working on audio data isContinue reading “Speech Data Processing Takes Flight”

New open source tools to unlock speech and audio data

Introducing Lhotse, a Python library for handling speech data. By Piotr Żelasko, Jan Vainer, Tomáš Nekvinda, and Ben Lorica. Introduction Of the many voice applications for AI, speech recognition is the most widely known and deployed as a building block of voice assistants. Voice and speech recognition market alone is expected to grow from $9.4Continue reading “New open source tools to unlock speech and audio data”

Open source libraries for Text and Time Series

Subscribe • Previous Issues Data Exchange podcast Unleashing the power of large language models: If you work with text, you should incorporate transformer-based language models into your NLP pipelines. You can either build your own tools or use libraries that come with pre-trained models. Maarten Grootendorst, is the author of open source libraries that I’ve come toContinue reading “Open source libraries for Text and Time Series”

Confidential Computing and Machine Learning

Measuring the popularity and exploring the readiness of Confidential Computing tools. In order to have a comprehensive data protection and privacy policy, organizations must ensure the confidentiality and integrity of your data in these states: at rest, in use, and in transit. We previously reviewed the ecosystem of tools for protecting data while in use.Continue reading “Confidential Computing and Machine Learning”

Foundation Models: A Primer for Investors and Builders

A non-technical guide and market map. By Kenn So and Ben Lorica. What are foundation models? Foundation models (FM) are a class of machine learning models that are trained on diverse data and can be adapted or fine-tuned for a wide range of downstream tasks. The term “foundation” is controversial among some researchers, but setting asideContinue reading “Foundation Models: A Primer for Investors and Builders”

Machine Learning at a Pegacorn

Subscribe • Previous Issues Major Tech Companies 💙 Metaverse What have popular technology news sources been covering in 2022?  Metaverse joins other hot areas (AI and cloud computing) on the list of top topics covered in 2022. By examining a variety of metrics, we identify which companies are investing in the Metaverse. Read The Post Data ExchangeContinue reading “Machine Learning at a Pegacorn”

Summer of Orchestration

From the Data Exchange podcast, we present recent conversations with creators of popular open source data and machine learning orchestration frameworks. Modernizing an organization’s data infrastructure is increasingly difficult without an orchestrator. At a high-level, these are tools that enable developers to write, schedule, monitor, and manage pipelines. In the early stages of gathering andContinue reading “Summer of Orchestration”

Tech companies are gearing up for the Metaverse

Major technology companies are investing in the Metaverse. Enterprises should take early action to stay ahead of the curve. In the aughts, I was a user and proponent of earlier versions of virtual worlds (specifically of Second Life). Unfortunately, the technology was clunky and the user base never really grew beyond a few hundred thousandContinue reading “Tech companies are gearing up for the Metaverse”

DALL·E 2 Decoded

Subscribe • Previous Issues Guide to Data Annotation and Synthetic Data We recently examined the landscape of tools for building training datasets – specifically tools for data annotation and synthetic data generation.  Taking into account emerging trends in machine learning and artificial intelligence, we assembled guidelines to help you navigate the explosion of tools in these areas.Continue reading “DALL·E 2 Decoded”

A Guide to Data Annotation and Synthetic Data Generation Tools

Trends to consider when evaluating data annotation and synthetic data generation systems. As we noted in a recent post (“Machine Learning trends you need to know”) researchers are increasingly interested in tools and techniques for labeling, cleaning, augmenting, and enhancing datasets used by machine learning models. In fact, data scientists and machine learning engineers haveContinue reading “A Guide to Data Annotation and Synthetic Data Generation Tools”