How AI can help to prevent the spread of disinformation

[This post originally appeared on Information Age.]

Our industry has a duty to discuss the dark side of technology. Yet many organisations — including some that wield enormous power and influence — are reluctant to acknowledge that their platforms are used to spread disinformation, foster hatred, facilitate bullying, and much else that makes our world a worse place in which to live.

Disinformation — what is sometimes called “fake news” — is a prime example of the unintended consequences of new technology. Its purpose is purely to create discord; it poisons public discourse and feeds festering hatreds with a litany of lies. What makes disinformation so effective is that it exploits characteristics of human nature such as confirmation bias, then seizes on the smallest seed of doubt and amplifies it with untruths and obfuscation.

Disinformation has spawned a new sub-industry within journalism, with fact checkers working around the clock to analyse politicians’ speeches, articles from other publications and news reports, and government statistics among much else. But the sheer volume of disinformation, together with its ability to multiply and mutate like a virus on a variety of social platforms, means that thorough fact-checking is only possible on a tiny proportion of disputed articles.

While technology has provided the seedbed and distribution for disinformation, it also offers a solution to the issue. Artificial intelligence in particular offers powerful tools in the fight against disinformation, working on multiple levels to identify dubious content. These techniques are broadly split between content-based and response-based identification. The former works much like a human fact checker, by matching the content of an article with trusted sources of information to highlight errors or outright lies.

But disinformation is an insidious beast and doesn’t always include facts that can be checked. This could involve a distorted or mis-captioned image, highly tendentious or biased reporting, or misleading stories that are not based on facts but rather use specious arguments to promote a particular cause. Another issue is false positives generated by satire or parody articles (which can be hard enough for many humans to detect without a winking emoji).

This is where response-based identification brings real value. Rather than relying on the text of an article as the primary source of information, this technique examines patterns of propagation as the news spreads through social media. By looking at ‘likes’, comments, temporal patterns in the spread of stories and the reputation of those who post and engage with the content, analysts can build a very clear idea of how trustworthy it is.

These concepts harness many of the techniques developed in the field of digital media forensics, a discipline dedicated to identifying issues such as plagiarism and “forged media”, where genuine content is digitally manipulated, or fake articles fabricated to look like they come from reputable news sources.

Thanks to this pioneering work, there is now a range of incredibly sophisticated tools, many harnessing the power of machine learning. These include signal processing analysis, which can identify bad actors through their use of compression software; physics-level analysis, which examines inconsistencies in lighting, landscapes shadows, and the like; as well as techniques that look at semantic and even physiological signals.

There was a compelling, although distressing, illustration of these techniques last summer. A team of fact checkers at the BBC used multiple open-source forensic investigation technologies (including Google Earth) to prove that a disturbing video of an atrocity had, in fact, been committed by government soldiers – a claim that the state had initially decried as “fake news”.

Tools such as these are crucial front-line weapons in the war against disinformation. But identifying disinformation is only the start: publishers and other organisations need to back these up with robust intervention strategies to take down or limit the spread of this content as soon as it appears, and to ensure that those who are exposed to it are alerted and served with content that counters the false information in the original – a process known as “decontamination”. One promising decontamination strategy that researchers are examining is the competing cascade, which places trusted, truthful information directly into a user’s newsfeed to compete with the lies in the original article.

The internet was supposed to usher in a new era for humanity, bringing unprecedented knowledge to the whole world — and in many ways it has. Disinformation is the antithesis of this dream, poisoning the well from which we all drink. But let’s not forget that we’re still in our digital infancy, still working out how to combat the raft of new societal problems that the Internet has created. With disinformation and “fake news”, we have the will and the technology to fight against it.