How AI can help to prevent the spread of disinformation

[This post originally appeared on Information Age.]

Our industry has a duty to discuss the dark side of technology. Yet many organisations — including some that wield enormous power and influence — are reluctant to acknowledge that their platforms are used to spread disinformation, foster hatred, facilitate bullying, and much else that makes our world a worse place in which to live.

Disinformation — what is sometimes called “fake news” — is a prime example of the unintended consequences of new technology. Its purpose is purely to create discord; it poisons public discourse and feeds festering hatreds with a litany of lies. What makes disinformation so effective is that it exploits characteristics of human nature such as confirmation bias, then seizes on the smallest seed of doubt and amplifies it with untruths and obfuscation.

Disinformation has spawned a new sub-industry within journalism, with fact checkers working around the clock to analyse politicians’ speeches, articles from other publications and news reports, and government statistics among much else. But the sheer volume of disinformation, together with its ability to multiply and mutate like a virus on a variety of social platforms, means that thorough fact-checking is only possible on a tiny proportion of disputed articles.
Continue reading “How AI can help to prevent the spread of disinformation”

The evolution and expanding utility of Ray

[A version of this post appears on the O’Reilly Radar.]

There are growing numbers of users and contributors to the framework, as well as libraries for reinforcement learning, AutoML, and data science.

In a recent post, I listed some of the early use cases described in the first meetup dedicated to Ray—a distributed programming framework from UC Berkeley’s RISE Lab. A second meetup took place a few months later, and both events featured some of the first applications built with Ray. On the development front, the core API has stabilized and a lot of work has gone into improving Ray’s performance and stability. The project now has around 5,700 stars on GitHuband more than 100 contributors across many organizations.

At this stage of the project, how does one describe Ray to those who aren’t familiar with the project? The RISE Lab team describes Ray as a “general framework for programming your cluster or cloud.” To place the project into context, Ray and cloud functions (FaaS, serverless) currently sit somewhere in the middle between extremely flexible systems on one end or systems that are much more targeted and emphasize ease of use. More precisely, users currently can avail of extremely flexible cluster management and virtualization tools on one end (Docker, Kubernetes, Mesos, etc.), or domain specific systems on the other end of the flexibility spectrum (Spark, Kafka, Flink, PyTorch, TensorFlow, Redshift, etc.).
Continue reading “The evolution and expanding utility of Ray”

The technical, societal, and cultural challenges that come with the rise of fake media

[A version of this post appears on the O’Reilly Radar.]

The O’Reilly Data Show Podcast: Siwei Lyu on machine learning for digital media forensics and image synthesis.

In this episode of the Data Show, I spoke with Siwei Lyu, associate professor of computer science at the University at Albany, State University of New York. Lyu is a leading expert in digital media forensics, a field of research into tools and techniques for analyzing the authenticity of media files. Over the past year, there have been many stories written about the rise of tools for creating fake media (mainly images, video, audio files). Researchers in digital image forensics haven’t exactly been standing still, though. As Lyu notes, advances in machine learning and deep learning have also found a receptive audience among the forensics community.

We had a great conversation spanning many topics including:

  • The many indicators used by forensic experts and forgery detection systems
  • Balancing “open” research with risks that come with it—including “tipping off” adversaries
  • State-of-the-art detection tools today, and what the research community and funding agencies are working on over the next few years.
  • Technical, societal, and cultural challenges that come with the rise of fake media.

Here are some highlights from our conversation:
Continue reading “The technical, societal, and cultural challenges that come with the rise of fake media”

Using machine learning and analytics to attract and retain employees

[A version of this post appears on the O’Reilly Radar blog.]

The O’Reilly Data Show Podcast: Maryam Jahanshahi on building tools to help improve efficiency and fairness in how companies recruit.

In this episode of the Data Show, I spoke with Maryam Jahanshahi, research scientist at TapRecruit, a startup that uses machine learning and analytics to help companies recruit more effectively. In an upcoming survey, we found that a “skills gap” or “lack of skilled people” was one of the main bottlenecks holding back adoption of AI technologies. Many companies are exploring a variety of internal and external programs to train staff on new tools and processes. The other route is to hire new talent. But recent reports suggest that demand for data professionals is strong and competition for experienced talent is fierce. Jahanshahi and her team are building natural language and statistical tools that can help companies improve their ability to attract and retain talent across many key areas.

Here are some highlights from our conversation:

Optimal job titles

The conventional wisdom in our field has always been that you want to optimize for “the number of good candidates” divided by “the number of total candidates.” … The thinking is that one of the ways in which you get a good signal-to-noise ratio is if you advertise for a more senior role. … In fact, we found the number of qualified applicants was lower for the senior data scientist role.

… We saw from some of our behavioral experiments that people were feeling like that was too senior a role for them to apply to. What we would call the “confidence gap” was kicking in at that point. It’s a pretty well-known phenomena that there are different groups of the population that are less confident. This has been best characterized in terms of gender. It’s the idea that most women only apply for jobs when they meet 100% of the qualifications versus most men will apply even with 60% of the qualifications. That was actually manifesting.

Highlighting benefits

We saw a lot of big companies that would offer 401(k), that would offer health insurance or family leave, but wouldn’t mention those benefits in the job descriptions. This had an impact on how candidates perceived these companies. Even though it’s implied that Coca-Cola is probably going to give you 401(k) and health insurance, not mentioning it changes the way you think of that job.
Continue reading “Using machine learning and analytics to attract and retain employees”

How machine learning impacts information security

[A version of this post appears on the O’Reilly Radar.]

The O’Reilly Data Show Podcast: Andrew Burt on the need to modernize data protection tools and strategies.

In this episode of the Data Show, I spoke with Andrew Burt, chief privacy officer and legal engineer at Immuta, a company building data management tools tuned for data science. Burt and cybersecurity pioneer Daniel Geer recently released a must-read white paper (“Flat Light”) that provides a great framework for how to think about information security in the age of big data and AI. They list important changes to the information landscape and offer suggestions on how to alleviate some of the new risks introduced by the rise of machine learning and AI.

We discussed their new white paper, cybersecurity (Burt was previously a special advisor at the FBI), and an exciting new Strata Data tutorial that Burt will be co-teaching in March.
Continue reading “How machine learning impacts information security”

9 AI trends on our radar

[A version of this post appears on the O’Reilly Radar.]

How new developments in automation, machine deception, hardware, and more will shape AI.

Here are key AI trends business leaders and practitioners should watch in the months ahead.

We will start to see technologies enable partial automation of a variety of tasks.

Automation occurs in stages. While full automation might still be a ways off, there are many workflows and tasks that lend themselves to partial automation. In fact, McKinsey estimates that “fewer than 5% of occupations can be entirely automated using current technology. However, about 60% of occupations could have 30% or more of their constituent activities automated.”

We have already seen some interesting products and services that rely on computer vision and speech technologies, and we expect to see even more in 2019. Look for additional improvements in language models and robotics that will result in solutions that target text and physical tasks. Rather than waiting for a complete automation model, competition will drive organizations to implement partial automation solutions—and the success of those partial automation projects will spur further development.
Continue reading “9 AI trends on our radar”

7 data trends on our radar

[A version of this post appears on the O’Reilly Radar.]

From infrastructure to tools to training, here’s what’s ahead for data.

Whether you’re a business leader or a practitioner, here are key data trends to watch and explore in the months ahead.

Increasing focus on building data culture, organization, and training

In a recent O’Reilly survey, we found that the skills gap remains one of the key challenges holding back the adoption of machine learning. The demand for data skills (“the sexiest job of the 21st century”) hasn’t dissipated. LinkedIn recently found that demand for data scientists in the US is “off the charts,” and our survey indicated that the demand for data scientists and data engineers is strong not just in the US but globally.

With the average shelf life of a skill today at less than five years and the cost to replace an employee estimated at between six and nine months of the position’s salary, there is increasing pressure on tech leaders to retain and upskill rather than replace their employees in order to keep data projects (such as machine learning implementations) on track. We are also seeing more training programs aimed at executives and decision makers, who need to understand how these new ML technologies can impact their current operations and products.
Continue reading “7 data trends on our radar”