You created a machine learning application. Now make sure it’s secure.

[A version of this post appears on the O’Reilly Radar.]

The software industry has demonstrated, all too clearly, what happens when you don’t pay attention to security.

By Ben Lorica and Mike Loukides.

In a recent post, we described what it would take to build a sustainable machine learning practice. By “sustainable,” we mean projects that aren’t just proofs of concepts or experiments. A sustainable practice means projects that are integral to an organization’s mission: projects by which an organization lives or dies. These projects are built and supported by a stable team of engineers, and supported by a management team that understands what machine learning is, why it’s important, and what it’s capable of accomplishing. Finally, sustainable machine learning means that as many aspects of product development as possible are automated: not just building models, but cleaning data, building and managing data pipelines, testing, and much more. Machine learning will penetrate our organizations so deeply that it won’t be possible for humans to manage them unassisted.

Organizations throughout the world are waking up to the fact that security is essential to their software projects. Nobody wants to be the next Sony, the next Anthem, or the next Equifax. But while we know how to make traditional software more secure (even though we frequently don’t), machine learning presents a new set of problems. Any sustainable machine learning practice must address machine learning’s unique security issues. We didn’t do that for traditional software, and we’re paying the price now. Nobody wants to pay the price again. If we learn one thing from traditional software’s approach to security, it’s that we need to be ahead of the curve, not behind it. As Joanna Bryson writes, “Cyber security and AI are inseparable.”
Continue reading “You created a machine learning application. Now make sure it’s secure.”

Why your attention is like a piece of contested territory

[A version of this post appears on the O’Reilly Radar.]

The O’Reilly Data Show Podcast: P.W. Singer on how social media has changed, war, politics, and business.

In this episode of the Data Show, I spoke with P.W. Singer, strategist and senior fellow at the New America Foundation, and a contributing editor at Popular Science. He is co-author of an excellent new book, LikeWar: The Weaponization of Social Media, which explores how social media has changed war, politics, and business. The book is essential reading for anyone interested in how social media has become an important new battlefield in a diverse set of domains and settings.

We had a great conversation spanning many topics, including:

  • In light of the 10th anniversary of his earlier book Wired for War, we talked about progress in robotics over the past decade.
  • The challenge posed by the fact that social networks reward virality, not veracity.
  • How the internet has emerged as an important new battlefield.
  • How this new online battlefield changes how conflicts are fought and unfold.
  • How many of the ideas and techniques covered in LikeWarare trickling down from nation-state actors influencing global events, to consulting companies offering services that companies and individuals can use.

Continue reading “Why your attention is like a piece of contested territory”

How AI can help to prevent the spread of disinformation

[This post originally appeared on Information Age.]

Our industry has a duty to discuss the dark side of technology. Yet many organisations — including some that wield enormous power and influence — are reluctant to acknowledge that their platforms are used to spread disinformation, foster hatred, facilitate bullying, and much else that makes our world a worse place in which to live.

Disinformation — what is sometimes called “fake news” — is a prime example of the unintended consequences of new technology. Its purpose is purely to create discord; it poisons public discourse and feeds festering hatreds with a litany of lies. What makes disinformation so effective is that it exploits characteristics of human nature such as confirmation bias, then seizes on the smallest seed of doubt and amplifies it with untruths and obfuscation.

Disinformation has spawned a new sub-industry within journalism, with fact checkers working around the clock to analyse politicians’ speeches, articles from other publications and news reports, and government statistics among much else. But the sheer volume of disinformation, together with its ability to multiply and mutate like a virus on a variety of social platforms, means that thorough fact-checking is only possible on a tiny proportion of disputed articles.
Continue reading “How AI can help to prevent the spread of disinformation”

The evolution and expanding utility of Ray

[A version of this post appears on the O’Reilly Radar.]

There are growing numbers of users and contributors to the framework, as well as libraries for reinforcement learning, AutoML, and data science.

In a recent post, I listed some of the early use cases described in the first meetup dedicated to Ray—a distributed programming framework from UC Berkeley’s RISE Lab. A second meetup took place a few months later, and both events featured some of the first applications built with Ray. On the development front, the core API has stabilized and a lot of work has gone into improving Ray’s performance and stability. The project now has around 5,700 stars on GitHuband more than 100 contributors across many organizations.

At this stage of the project, how does one describe Ray to those who aren’t familiar with the project? The RISE Lab team describes Ray as a “general framework for programming your cluster or cloud.” To place the project into context, Ray and cloud functions (FaaS, serverless) currently sit somewhere in the middle between extremely flexible systems on one end or systems that are much more targeted and emphasize ease of use. More precisely, users currently can avail of extremely flexible cluster management and virtualization tools on one end (Docker, Kubernetes, Mesos, etc.), or domain specific systems on the other end of the flexibility spectrum (Spark, Kafka, Flink, PyTorch, TensorFlow, Redshift, etc.).
Continue reading “The evolution and expanding utility of Ray”

The technical, societal, and cultural challenges that come with the rise of fake media

[A version of this post appears on the O’Reilly Radar.]

The O’Reilly Data Show Podcast: Siwei Lyu on machine learning for digital media forensics and image synthesis.

In this episode of the Data Show, I spoke with Siwei Lyu, associate professor of computer science at the University at Albany, State University of New York. Lyu is a leading expert in digital media forensics, a field of research into tools and techniques for analyzing the authenticity of media files. Over the past year, there have been many stories written about the rise of tools for creating fake media (mainly images, video, audio files). Researchers in digital image forensics haven’t exactly been standing still, though. As Lyu notes, advances in machine learning and deep learning have also found a receptive audience among the forensics community.

We had a great conversation spanning many topics including:

  • The many indicators used by forensic experts and forgery detection systems
  • Balancing “open” research with risks that come with it—including “tipping off” adversaries
  • State-of-the-art detection tools today, and what the research community and funding agencies are working on over the next few years.
  • Technical, societal, and cultural challenges that come with the rise of fake media.

Here are some highlights from our conversation:
Continue reading “The technical, societal, and cultural challenges that come with the rise of fake media”