Images and videos have long been edited to deceive, but the believability of recent deepfake videos highlights another threat that could evolve into a much larger problem

Manipulating images was commonplace long before photography entered the digital era. Whether it was to disguise wonky composition or to cut away something from the edge of the frame, photographers have long used the tricks of the darkroom to make us believe that an image was originally captured as it eventually appeared.

The editing process may be different today, but the tools that are used to carry this out have long been accessible to all. For all but complex editing, even a computer is now redundant as app-based editing and AI tools running on today’s powerful breed of smartphones and tablets achieve what would have been unthinkable ten years ago.

But while we’re used to the idea of online images not necessarily showing a scene or subject as it may have appeared in reality, it’s only in the past few years that video fakery has been so prominently discussed.

This has, of course, been spurred by the believability of many viral examples, such as recent videos that show what appears to be Tom Cruise playing golf and performing a magic trick. While undeniably impressive, other videos designed specifically to smear politicians and other public figures, and to undermine democratic processes, show just how easily these tools can be weaponized too.

Proving provenance

Clearly, some of these examples are intended to entertain rather than mislead us. But, collectively, they provide a backdrop for a handful of recent initiatives designed to provide greater clarity on the provenance of digital media.

Two of these are the Content Authenticity Initiative (CAI) and Project Origin. The former was launched in 2018 by Adobe, Twitter, and The New York Times Company, with an initial mission of developing the industry standard for content attribution in order to help people determine what’s likely to be trustworthy. 

Project Origin, meanwhile, founded last year, brought together the BBC, Canadian Broadcasting Corporation/Radio Canada, Microsoft, and The New York Times Company, with a more targeted focus on news organizations. The similar aims of the two gave a logical basis for their collaboration on a Joint Development Foundation project, named the Coalition for Content Provenance and Authenticity (C2PA), which was formed through an alliance between Adobe, Arm, Intel, Microsoft, and Truepic.

Having the participation of leaders in their field for these initiatives is imperative if they are to be widely adopted. But another obvious benefit is that it allows each organization to bring its own specific expertise to the party.

This is important, as deception can come through many different types of media. And the CAI has not limited its scope here; its white paper makes it clear that what it presents is “a set of standards that can be used to create and reveal attribution and history for images, documents, time-based media (video, audio) and streaming content.”

We may immediately think of dubious news articles, doctored images and deepfake videos when we think about deceptive information online, but a recent news story highlighted one direction in which this deception could evolve. An interview with the director of an upcoming documentary about Anthony Bourdain revealed that the voice of the late chef and television presenter had been deepfaked with the help of AI, the purpose being to make viewers believe that Bourdain himself had narrated text from an email he had written. The story was met with an angry response from many individuals, including those who knew Bourdain personally. 

This shouldn’t have come as much of a surprise, although it’s entirely possible we have already been subjected to this same manipulation without us realizing it. Indeed, it’s easy to consider such a situation that would draw few objections.

A recorded voiceover in need of a few adjustments, for example, but without the possibility of the original actor re-recording this. It’s reasonable to assume that, were they alive, the actor in question may well consent to this if it meant a project could be finished. Would it receive a similar reaction, were people to find out this had happened? Or is it more a question of respect, given that Bourdain is no longer alive? Perhaps the output is what matters more: If this were a work of fiction rather than a documentary, would we care as much? 

The use of AI in this way might be relatively new, but dubbing speech and using either archival footage or CGI when an actor is not available is commonplace. Even so, as an illustration of the rate of progress over the last 20 years, the clip above, taken from an episode of HBO’s The Sopranos that aired in 2001, demonstrates that what was once passable from a major television network is significantly behind what an individual without the same kind of budget can achieve today.

Finding your voice

The backlash over the Bourdain documentary voiceover is ongoing, but perhaps we should have seen these kinds of problems coming. After all, it’s not just the fact that the subjects in these deepfake videos look like Tom Cruise, Mark Zuckerberg and others – it’s that they sound like them too. 

Quite how well you can spot this kind of trickery depends in part on exactly what’s being presented to you. Video and audio together can make deception easier to identify; speech may be slightly mismatched from mouth movements, for example, while blurring or unnatural facial expressions may also give things away. But if we’re listening to a voice alone, the absence of any visual quirks means we have a far greater chance of being fooled.

In many of these deepfake videos, no harm is intended or caused. If we show these to a friend, we do so to impress them with what has been created, rather than to hurt them in any way. Furthermore, as Chris Ume, the creator of the Tom Cruise deepfakes has made clear, creating these involves considerable time and effort, so the chance of us all becoming targets in something similarly sophisticated remains small.

But being deceived by a voice that belongs to someone you know suggests a much more sinister potential route for this technology. What if you were to receive a desperate call or voicemail from what sounded like a relative asking for help? Or for money? Or personal information of some sort? What if the same kind of trickery was used in a professional environment to authorize a financial transaction? Or to access an account of some sort that makes use of voice authentication?

How would this even work? The director of the Bourdain documentary has stated that it took over 10 hours of audio of Anthony Bourdain speaking, which was fed into a machine-learning program to develop this synthetic voice. A quick online search shows there to be a slew of tools that promise you can synthesize your own voice in a similar manner, but, for the purposes of illicit activity, such tools could easily be abused to synthesize those belonging to others.

When you consider just how many people – even those who many would not consider to be particularly famous – have clocked up a similar amount of time to what was required from Bourdain speaking in high-quality, publicly accessible media (YouTube videos, podcasts, webinars and so on), it’s hard not to think of what the potential consequences of this could be.

Perhaps this sounds far-fetched. Perhaps the way these kinds of harms evolve means that things will take another route, resulting in a different kind of threat with a different set of safeguards. And perhaps these safeguards will ensure that as soon as these threats materialize the fallout will be minimal. Nevertheless, it’s not inconceivable that it may come to a point where we start to feel the need to guard our voice online like we guard our images and personal information today. 

 

 

Related articles