‘AI Obama’ and fake newscasters: How AI audio is swarming TikTok

By STUART A. THOMPSON AND SAPNA MAHESHWARI NYTimes News Service | Thursday, October 19, 2023, 12:05 a.m.

Click here and log in to view this story

Subscribe Now Choose a package that suits your preferences.

Start Free Account Get access to 7 premium stories every month for FREE!

Already a Subscriber? Current print subscriber? Activate your complimentary Digital account.

In a slickly produced TikTok video, former President Barack Obama — or a voice eerily like his — can be heard defending himself against an explosive new conspiracy theory about the sudden death of his former chef.

“While I cannot comprehend the basis of the allegations made against me,” the voice says, “I urge everyone to remember the importance of unity, understanding and not rushing to judgments.”

In fact, the voice did not belong to the former president. It was a convincing fake, generated by artificial intelligence using sophisticated new tools that can clone real voices to create AI puppets with a few clicks of a mouse.

The technology used to create AI voices has gained traction and wide acclaim since companies such as ElevenLabs released a slate of new tools late last year. Since then, audio fakes have rapidly become a new weapon on the online misinformation battlefield, threatening to turbocharge political disinformation before the 2024 election by giving creators a way to put their conspiracy theories into the mouths of celebrities, newscasters and politicians.

The fake audio adds to the AI-generated threats from “deepfake” videos, humanlike writing from ChatGPT and images from services such as Midjourney.

Disinformation watchdogs have noticed the number of videos containing AI voices has increased as content producers and misinformation peddlers adopt the novel tools. Social platforms including TikTok are scrambling to flag and label such content.

The video that sounded like Obama was discovered by NewsGuard, a company that monitors online misinformation. The video was published by one of 17 TikTok accounts pushing baseless claims with fake audio that NewsGuard identified, according to a report the group released in September. The accounts mostly published videos about celebrity rumors using narration from an AI voice, but also promoted the baseless claim that Obama is gay and the conspiracy theory that Oprah Winfrey is involved in the slave trade. The channels had collectively received hundreds of millions of views and comments that suggested some viewers believed the claims.

While the channels had no obvious political agenda, NewsGuard said, the use of AI voices to share mostly salacious gossip and rumors offered a road map for bad actors wanting to manipulate public opinion and share falsehoods to mass audiences online.

“It’s a way for these accounts to gain a foothold, to gain a following that can draw engagement from a wide audience,” said Jack Brewster, the enterprise editor at NewsGuard. “Once they have the credibility of having a large following, they can dip their toe into more conspiratorial content.”

TikTok requires labels disclosing realistic AI-generated content as fake, but they did not appear on the videos flagged by NewsGuard. TikTok said it had removed or stopped recommending several of the accounts and videos for violating policies around posing as news organizations and spreading harmful misinformation. It also removed the video using the AI-generated voice that mimicked Obama’s for violating TikTok’s synthetic media policy, as it contained highly realistic content not labeled altered or fake.

“TikTok is the first platform to provide a tool for creators to label AI-generated content and an inaugural member of a new code of industry best practices promoting the responsible use of synthetic media,” said Jamie Favazza, a spokesperson for TikTok, referring to a recently introduced framework from the nonprofit Partnership on AI.

Although NewsGuard’s report focused on TikTok, which has increasingly become a source of news, similar content was found spreading on YouTube, Instagram and Facebook.

Platforms like TikTok allow AI-generated content of public figures, including newscasters, so long as they do not spread misinformation. Parody videos showing AI-generated conversations between politicians, celebrities or business leaders — some dead — have spread widely since the tools became popular. Manipulated audio adds a new layer to deceptive videos on the platforms that have already featured fake versions of Tom Cruise, Elon Musk and newscasters like Gayle King and Norah O’Donnell. TikTok and other platforms have been grappling with a spate of misleading ads lately featuring deepfakes of celebrities like Cruise and YouTube star Mr. Beast.

The power of these technologies could profoundly sway viewers. “We do know audio and video are perhaps more sticky in our memories than text,” said Claire Leibowicz, head of AI and media integrity at the Partnership on AI, which has worked with technology and media companies on a set of recommendations for creating, sharing and distributing AI-generated content.

TikTok said last month that it was introducing a label that users could select to show whether their videos used AI. In April, the app started requiring users to disclose manipulated media showing realistic scenes and prohibiting deepfakes of young people and private figures. David Rand, a professor of management science at the Massachusetts Institute of Technology whom TikTok consulted for advice on how to word the new labels, said the labels were of limited use when it came to misinformation because “the people who are trying to be deceptive are not going to put the label on their stuff.”

TikTok also said last month that it was testing automated tools to detect and label AI-generated media, which Rand said would be more helpful, at least in the short term.

YouTube bans political ads from using AI and requires other advertisers to label their ads when AI is used. Meta, which owns Facebook, added a label to its fact-checking tool kit in 2020 that describes whether a video is “altered.” And X, formerly known as Twitter, requires misleading content to be “significantly and deceptively altered, manipulated or fabricated” to violate its policies. The company did not respond to requests for comment.

Obama’s AI voice was created using tools from ElevenLabs, a company that burst onto the international stage late last year with its free-to-use AI text-to-speech tool capable of producing lifelike audio in seconds. The tool also allowed users to upload recordings of someone’s voice and produce a digital copy.

After the tool was released, users on 4chan, the right-wing message board, organized to create a fake version of actor Emma Watson reading an antisemitic screed.

ElevenLabs, a company with 27 employees with headquarters in New York City, responded to the misuse by limiting the voice-cloning feature to paid users. The company also released an AI detection tool that is capable of identifying AI content produced by its services.

“Over 99% of users on our platform are creating interesting, innovative, useful content,” a representative for ElevenLabs said in an emailed statement, “but we recognize that there are instances of misuse, and we’ve been continually developing and releasing safeguards to curb them.”

In tests by The New York Times, ElevenLabs’ detector successfully identified audio from the TikTok accounts as AI-generated. But the tool failed when music was added to the clip or when the audio was distorted, suggesting that misinformation peddlers could easily elude detection.

AI companies and academics have explored other methods to identify fake audio, with mixed results. Some companies explored adding an invisible watermark to AI audio by embedding signals that it was AI-generated. Others have pushed AI companies to limit the voices that can be cloned, potentially banning replicas of politicians like Obama — a practice already in place with some image-generation tools like Dall-E, which refuses to generate some political imagery.

Leibowicz at the Partnership on AI said synthetic audio was uniquely challenging to flag for listeners compared with visual alterations.

“If we were a podcast, would you need a label every five seconds?” Leibowicz said. “How do you have a signal in some long piece of audio that’s consistent?”

Even if platforms adopt AI detectors, the technology must constantly improve to keep up with advances in AI generation.

TikTok said it was building new detection methods in-house and exploring options for outside partnerships.

Previous: « Colorado man drowns Kona after being hit by rogue wave
Next: Biden declares Israel and Ukraine support is vital for US security, will ask Congress for billions »