Skip to content

OpenAI launches audio features that read text and clone human voices

By | Published | No Comments

OpenAI launches audio features that read text and clone human voices

OpenAI decided not to roll out the feature more broadly and briefed reporters on the situation earlier this month.

OpenAI is sharing early results from a test of a feature that reads words with a convincing human voice, highlighting a new frontier in artificial intelligence and raising concerns about the risk of deepfakes.

The company is sharing early demonstrations and use cases of a small-scale preview of a text-to-speech model called Speech Engine, which it has shared with about 10 developers so far, a spokesperson said.

OpenAI decided not to roll out the feature more broadly and briefed reporters on the situation earlier this month.

A spokesperson for OpenAI said the company decided to scale back the release after receiving feedback from stakeholders including policymakers, industry experts, educators and creatives. The company initially planned to release the tool to up to 100 developers through an application process, according to an earlier press release.

“We recognize that generating speech that resembles people’s voices carries serious risks, which is especially important in an election year,” the company wrote in a blog post on Friday. “We are working with people from government, media, entertainment, education, U.S. and international partners in areas such as civil society are working together to ensure we incorporate their feedback as we build.”

Other AI techniques have been used to fake voices in some situations. In January, a fake but realistic-sounding phone call claiming to be from President Joe Biden encouraged people in New Hampshire not to vote in the primary election — an incident that sparked concerns about artificial intelligence ahead of crucial global elections. Intelligent fear.

Unlike OpenAI’s previous efforts to generate audio content, the Speech Engine can create speech that sounds like an individual, with a specific rhythm and intonation. The software only needs to record 15 seconds of audio of a person speaking to recreate their voice.

During a demonstration of the tool, Bloomberg listened to a brief clip of OpenAI CEO Sam Altman explaining the technology, sounding indistinguishable from his actual speech but entirely artificial intelligence Generated.

“If you have the right audio settings, it’s basically human-level sound,” said Jeff Harris, OpenAI’s head of product. “It’s very impressive technical quality.” However, Harry “Obviously there are a lot of safety issues with the ability to really accurately imitate human speech,” Si said.

One of OpenAI’s current development partners using the tool, nonprofit health system Lifespan’s Norman Prince Neuroscience Institute, is using technology to help patients regain their voices. For example, the company’s blog post said the tool was used to restore the voice of a young patient who had lost the ability to speak clearly due to a brain tumor by replicating a speech she had previously recorded for a school project.

OpenAI’s custom speech model can also translate the audio it generates into different languages. This makes it very useful for companies in the audio industry, such as Spotify Technology SA. Spotify is already using the technology in its own pilot program to translate podcasts from popular hosts like Lex Fridman. OpenAI also touts other beneficial applications of the technology, such as creating a wider range of voices for educational content for children.

In the beta program, OpenAI requires its partners to agree to its usage policy, obtain consent before using the original speaker’s voice, and disclose to listeners that the sounds they hear are AI-generated. The company also installed an inaudible audio watermark to allow it to distinguish whether a piece of audio was created by its tool.

OpenAI said it is seeking feedback from outside experts before deciding whether to release the feature more broadly. “It’s important that people around the world understand where this technology is headed, whether or not we ultimately deploy it broadly,” the company said in a blog post.

OpenAI also wrote that it hopes the preview of its software will “ignite the need for increased social resilience” to the challenges posed by more advanced AI technologies. For example, the company is calling on banks to phase out voice authentication as a security measure for accessing bank accounts and sensitive information. It also seeks public education about deceptive AI content and the development of more technology to detect whether audio content is real or AI-generated.

(Except for the headline, this story has not been edited by NDTV staff and is published from a syndicated feed.)

Follow us on Google news ,Twitter , and Join Whatsapp Group of thelocalreport.in

Pooja Sood, a dynamic blog writer and tech enthusiast, is a trailblazer in the world of Computer Science. Armed with a Bachelor's degree in Computer Science, Pooja's journey seamlessly fuses technical expertise with a passion for creative expression.With a solid foundation in B.Tech, Pooja delves into the intricacies of coding, algorithms, and emerging technologies. Her blogs are a testament to her ability to unravel complex concepts, making them accessible to a diverse audience. Pooja's writing is characterized by a perfect blend of precision and creativity, offering readers a captivating insight into the ever-evolving tech landscape.