Skip to content

OpenAI previews an audio tool that can clone your voice in 15 seconds

By | Published | No Comments

OpenAI is sharing early results from a test of a feature that reads words aloud with a convincing human voice, highlighting a new frontier for artificial intelligence and raising concerns about the risk of deepfakes. The company is sharing early demonstrations and use cases of a small-scale preview of a text-to-speech model called Voice Engine, which it has shared with about 10 developers so far, a spokesperson said. OpenAI decided not to roll out the feature more broadly and briefed reporters on the situation earlier this month.

A spokesperson for OpenAI said the company decided to scale back the release after receiving feedback from stakeholders including policymakers, industry experts, educators and creatives. The company initially planned to release the tool to up to 100 developers through an application process, according to an earlier press release.

“We recognize there are serious risks associated with generating speech that resembles people’s voices, which is especially important in an election year,” the company wrote in a blog post on Friday. “We are working with people from government, media, entertainment, education, civil society and working with U.S. and international partners in other areas to ensure we incorporate their feedback during the construction process.”

Other AI techniques have been used to fake voices in some situations. In January, a fake but realistic-sounding phone call claiming to be from President Joe Biden encouraged people in New Hampshire not to vote in the primary election — an incident that sparked concerns about artificial intelligence ahead of crucial global elections. Intelligent fear.

Unlike OpenAI’s previous efforts to generate audio content, the Speech Engine can create speech that sounds like an individual, with a specific rhythm and intonation. The software only needs to record 15 seconds of audio of a person speaking to recreate their voice.

During a demonstration of the tool, Bloomberg listened to a brief clip of OpenAI CEO Sam Altman explaining the technology, sounding indistinguishable from his actual speech but entirely artificial intelligence Generated.

“If you have the right audio settings, it’s basically human-level sound,” said Jeff Harris, OpenAI’s head of product. “It’s very impressive technical quality.” However, Harris said, “Obviously there are a lot of safety issues with the ability to really accurately imitate human speech.”

One of OpenAI’s current development partners using the tool, nonprofit health system Lifespan’s Norman Prince Neuroscience Institute, is using technology to help patients regain their voices. For example, the company’s blog post said the tool was used to restore the voice of a young patient who had lost the ability to speak clearly due to a brain tumor by replicating a speech she had previously recorded for a school project.

OpenAI’s custom speech model can also translate the audio it generates into different languages. This makes it very useful for companies in the audio industry, such as Spotify Technology SA. Spotify is already using the technology in its own pilot program to translate podcasts from popular hosts like Lex Fridman. OpenAI also touts other beneficial applications of the technology, such as creating a wider range of voices for educational content for children.

In the beta program, OpenAI requires its partners to agree to its usage policy, obtain consent before using the original speaker’s voice, and disclose to listeners that the sounds they hear are AI-generated. The company also installed an inaudible audio watermark to allow it to distinguish whether a piece of audio was created by its tool.

OpenAI said it is seeking feedback from outside experts before deciding whether to release the feature more broadly. “It’s important that people around the world understand where this technology is headed, whether or not we ultimately deploy it broadly,” the company said in a blog post.

OpenAI also wrote that it hopes the preview of its software will “ignite the need for increased social resilience” to the challenges posed by more advanced AI technologies. For example, the company is calling on banks to phase out voice authentication as a security measure for accessing bank accounts and sensitive information. It also seeks public education about deceptive AI content and the development of more technology to detect whether audio content is real or AI-generated.

© 2024 Bloomberg


(This story has not been edited by NDTV staff and is auto-generated from a syndicated feed.)

Affiliate links may be automatically generated – see our Ethics Statement for details.

Follow us on Google news ,Twitter , and Join Whatsapp Group of thelocalreport.in

Surja, a dedicated blog writer and explorer of diverse topics, holds a Bachelor's degree in Science. Her writing journey unfolds as a fascinating exploration of knowledge and creativity.With a background in B.Sc, Surja brings a unique perspective to the world of blogging. Hers articles delve into a wide array of subjects, showcasing her versatility and passion for learning. Whether she's decoding scientific phenomena or sharing insights from her explorations, Surja's blogs reflect a commitment to making complex ideas accessible.