Data Engineer Archives - Gradient Flow

One simple chart: Demand for Machine Learning Engineers

I started hearing the job role of “machine learning engineer” a few years ago. In mid 2017 we wrote a post describing the role that we were observing in a few companies mainly in the technology sector. At the time, companies were carving out a new role focused on making machine learning and data scienceContinue reading “One simple chart: Demand for Machine Learning Engineers”

What Is a Lakehouse?

by Ben Lorica, Michael Armbrust, Ali Ghodsi, Reynold Xin and Matei Zaharia. [This post originally appeared on the Databricks blog.] Over the past few years at Databricks, we’ve seen a new data management paradigm that emerged independently across many customers and use cases: the lakehouse. In this post we describe this new paradigm and itsContinue reading “What Is a Lakehouse?”

Machine learning for operational analytics and business intelligence

The O’Reilly Data Show Podcast: Peter Bailis on data management, ML benchmarks, and building next-gen tools for analysts. In this episode of the Data Show, I speak with Peter Bailis, founder and CEO of Sisu, a startup that is using machine learning to improve operational analytics. Bailis is also an assistant professor of computer scienceContinue reading “Machine learning for operational analytics and business intelligence”

One simple chart: Who is interested in Apache Pulsar?

Multi-layer architecture, scalability, multitenancy, and durability are just some of the reasons companies have been using Pulsar. By Ben Lorica and Jesse Anderson. With companies producing data from an increasing number of systems and devices, messaging and event streaming solutions—particularly Apache Kafka—have gained widespread adoption. Over the past year, we’ve been tracking the progress of ApacheContinue reading “One simple chart: Who is interested in Apache Pulsar?”

Labeling, transforming, and structuring training data sets for machine learning

The O’Reilly Data Show Podcast: Alex Ratner on how to build and manage training data with Snorkel. In this episode of the Data Show, I speak with Alex Ratner, project lead for Stanford’s Snorkel open source project; Ratner also recently garnered a faculty position at the University of Washington and is currently working on a companyContinue reading “Labeling, transforming, and structuring training data sets for machine learning”

You’ll want Nexar’s newly released Live Map for your city

Extracting and exposing valuable insights to enable smart cities and many other applications. I recently had the privilege of getting a preview of Nexar’s Live Map, from my friend, Nexar’s CTO and co-founder Bruno Fernandez-Ruiz. Nexar uses off-the-shelf smartphones and dash-cams, sophisticated data ingestion, data processing, sensor fusion, and machine learning software to realize theirContinue reading “You’ll want Nexar’s newly released Live Map for your city”

Managing machine learning in the enterprise: Lessons from banking and health care

A look at how guidelines from regulated industries can help shape your ML strategy. By Ben Lorica, Harish Doddi, David Talby. As companies use machine learning (ML) and AI technologies across a broader suite of products and services, it’s clear that new tools, best practices, and new organizational structures will be needed. In recent posts,Continue reading “Managing machine learning in the enterprise: Lessons from banking and health care”

Tools for machine learning development

The O’Reilly Data Show: Ben Lorica chats with Jeff Meyerson of Software Engineering Daily about data engineering, data architecture and infrastructure, and machine learning. By Jenn Webb. In this week’s episode of the Data Show, we’re featuring an interview Data Show host Ben Lorica participated in for the Software Engineering Daily Podcast, where he was interviewedContinue reading “Tools for machine learning development”

Enabling end-to-end machine learning pipelines in real-world applications

The O’Reilly Data Show Podcast: Nick Pentreath on overcoming challenges in productionizing machine learning models. In this episode of the Data Show, I spoke with Nick Pentreath, principal engineer at IBM. Pentreath was an early and avid user of Apache Spark, and he subsequently became a Spark committer and PMC member. Most recently his focusContinue reading “Enabling end-to-end machine learning pipelines in real-world applications”

What are model governance and model operations?

A look at the landscape of tools for building and deploying robust, production-ready machine learning models. By Ben Lorica, Harish Doddi, David Talby. Our surveys over the past couple of years have shown growing interest in machine learning (ML) among organizations from diverse industries. A few factors are contributing to this strong interest in implementingContinue reading “What are model governance and model operations?”