What data scientists and data engineers can do with current generation serverless technologies

[A version of this post appears on the O’Reilly Radar.]

The O’Reilly Data Show Podcast: Avner Braverman on what’s missing from serverless today and what users should expect in the near future.

In this episode of the Data Show, I spoke with Avner Braverman, co-founder and CEO of Binaris, a startup that aims to bring serverless to web-scale and enterprise applications. This conversation took place shortly after the release of a seminal paper from UC Berkeley (“Cloud Programming Simplified: A Berkeley View on Serverless Computing”), and this paper seeded a lot of our conversation during this episode.

Serverless is clearly on the radar of data engineers and architects. In a recent survey, we found 85% of respondents already had parts of their data infrastructure in one of the public clouds, and 38% were already using at least one of the serverless offerings we listed. As more serverless offerings get rolled out—e.g., things like PyWren that target scientists—I expect these numbers to rise.

We had a great conversation spanning many topics, including:

  • A short history of cloud computing.
  • The fundamental differences between serverless and conventional cloud computing.
  • The reasons serverless—specifically AWS Lambda—took off so quickly.
  • What can data scientists and data engineers do with the current generation serverless offerings.
  • What is missing from serverless today and what should users expect in the near future.

Related resources:

Specialized tools for machine learning development and model governance are becoming essential

[A version of this post appears on the O’Reilly Radar.]

Why companies are turning to specialized machine learning tools like MLflow.

By Ben Lorica and Mike Loukides.

A few years ago, we started publishing articles (see “Related resources” at the end of this post) on the challenges facing data teams as they start taking on more machine learning (ML) projects. Along the way, we described a new job role and title—machine learning engineer—focused on creating data products and making data science work in production, a role that was beginning to emerge in the San Francisco Bay Area two years ago. At that time, there weren’t any popular tools aimed at solving the problems facing teams tasked with putting machine learning into practice.

About 10 months ago, Databricks announced MLflow, a new open source project for managing machine learning development (full disclosure: Ben Lorica is an advisor to Databricks). We thought that given the lack of clear open source alternatives, MLflow had a decent chance of gaining traction, and this has proven to be the case. Over a relatively short time period, MLflow has garnered more than 3,300 stars on GitHub and 80 contributors from more than 40 companies. Most significantly, more than 200 companies are now using MLflow.

So, why is this new open source project resonating with data scientists and machine learning engineers? Recall the following key attributes of a machine learning project:
Continue reading “Specialized tools for machine learning development and model governance are becoming essential”

It’s time for data scientists to collaborate with researchers in other disciplines

[A version of this post appears on the O’Reilly Radar.]

The O’Reilly Data Show Podcast: Forough Poursabzi Sangdeh on the interdisciplinary nature of interpretable and interactive machine learning.

In this episode of the Data Show, I spoke with Forough Poursabzi-Sangdeh, a postdoctoral researcher at Microsoft Research New York City. Poursabzi works in the interdisciplinary area of interpretable and interactive machine learning. As models and algorithms become more widespread, many important considerations are becoming active research areas: fairness and bias, safety and reliability, security and privacy, and Poursabzi’s area of focus—explainability and interpretability.

We had a great conversation spanning many topics, including:

  • Current best practices and state-of-the-art methods used to explain or interpret deep learning—or, more generally, machine learning models.
  • The limitations of current model interpretability methods.
  • The lack of clear/standard metrics for comparing different approaches used for model interpretability
  • Many current AI and machine learning applications augment humans, and, thus, Poursabzi believes it’s important for data scientists to work closely with researchers in other disciplines.
  • The importance of using human subjects in model interpretability studies.

Related resources:

Gliding down the world’s longest zipline

Ras Al Khaimah’s Jebel Jais mountain in the U.A.E. is home to the longest zipline in the world. The video below is from a GoPro camera on my helmet and was shot on 2019-03-14. I was the last person to zip down that day. There are two stages to this zipline site:

  • Stage 1: Is the longer of the two stages and you “fly” while lying down on a harness with your hands holding straps behind your back (for optimum aerodynamics). The view is spectacular!
  • Stage 2: In this stage you are in a “sitting position”, and truth be told, based on my experience and observation, I detected more apprehension from the people on the platform with this setup.

As you’ll see in the video, I fell a tad short in both stages, and had to be pulled in by the crew:

While I would not consider myself an “adventure traveler” or an “adrenaline junkie”, I found this to be an exhilarating experience and one that I would recommend to people traveling to the U.A.E. Ras Al Khaimah (RAK) has many things to offer and the area is full of spectacular things to do for travelers who love the outdoors. I leave you with a few photos from the desert in RAK:

Algorithms are shaping our lives – here’s how we wrest back control

[A version of this post appears on the O’Reilly Radar.]

The O’Reilly Data Show Podcast: Kartik Hosanagar on the growing power and sophistication of algorithms.

In this episode of the Data Show, I spoke with Kartik Hosanagar, professor of technology and digital business, and professor of marketing at The Wharton School of the University of Pennsylvania.  Hosanagar is also the author of a newly released book, A Human’s Guide to Machine Intelligence, an interesting tour through the recent evolution of AI applications, which draws from his extensive experience at the intersection of business and technology.

We had a great conversation spanning many topics, including:

  • The types of unanticipated consequences of which algorithm designers should be aware.
  • The predictability-resilience paradox: as systems become more intelligent and dynamic, they also become more unpredictable, so there are trade-offs algorithms designers must face.
  • Managing risk in machine learning: AI application designers need to weigh considerations such as fairness, security, privacy, explainability, safety, and reliability.
  • A bill of rights for humans impacted by the growing power and sophistication of algorithms.
  • Some best practices for bringing AI into the enterprise.

Related resources:

 

You created a machine learning application. Now make sure it’s secure.

[A version of this post appears on the O’Reilly Radar.]

The software industry has demonstrated, all too clearly, what happens when you don’t pay attention to security.

By Ben Lorica and Mike Loukides.

In a recent post, we described what it would take to build a sustainable machine learning practice. By “sustainable,” we mean projects that aren’t just proofs of concepts or experiments. A sustainable practice means projects that are integral to an organization’s mission: projects by which an organization lives or dies. These projects are built and supported by a stable team of engineers, and supported by a management team that understands what machine learning is, why it’s important, and what it’s capable of accomplishing. Finally, sustainable machine learning means that as many aspects of product development as possible are automated: not just building models, but cleaning data, building and managing data pipelines, testing, and much more. Machine learning will penetrate our organizations so deeply that it won’t be possible for humans to manage them unassisted.

Organizations throughout the world are waking up to the fact that security is essential to their software projects. Nobody wants to be the next Sony, the next Anthem, or the next Equifax. But while we know how to make traditional software more secure (even though we frequently don’t), machine learning presents a new set of problems. Any sustainable machine learning practice must address machine learning’s unique security issues. We didn’t do that for traditional software, and we’re paying the price now. Nobody wants to pay the price again. If we learn one thing from traditional software’s approach to security, it’s that we need to be ahead of the curve, not behind it. As Joanna Bryson writes, “Cyber security and AI are inseparable.”
Continue reading “You created a machine learning application. Now make sure it’s secure.”

Why your attention is like a piece of contested territory

[A version of this post appears on the O’Reilly Radar.]

The O’Reilly Data Show Podcast: P.W. Singer on how social media has changed, war, politics, and business.

In this episode of the Data Show, I spoke with P.W. Singer, strategist and senior fellow at the New America Foundation, and a contributing editor at Popular Science. He is co-author of an excellent new book, LikeWar: The Weaponization of Social Media, which explores how social media has changed war, politics, and business. The book is essential reading for anyone interested in how social media has become an important new battlefield in a diverse set of domains and settings.

We had a great conversation spanning many topics, including:

  • In light of the 10th anniversary of his earlier book Wired for War, we talked about progress in robotics over the past decade.
  • The challenge posed by the fact that social networks reward virality, not veracity.
  • How the internet has emerged as an important new battlefield.
  • How this new online battlefield changes how conflicts are fought and unfold.
  • How many of the ideas and techniques covered in LikeWarare trickling down from nation-state actors influencing global events, to consulting companies offering services that companies and individuals can use.

Continue reading “Why your attention is like a piece of contested territory”