Skip to content

OpenAI launches “speech engine”: it can imitate human speech using only 15 seconds of audio samples

By | Published | No Comments

OpenAI is known for its innovative advancements in artificial intelligence technology, with products including video generator Sora, and now launches “Speech Engine,” a groundbreaking voice cloning tool. This extraordinary audio model accurately replicates the nuances of human speech, including intonation and unique speech patterns, using only a brief 15-second sample of the original speech. Despite eager anticipation, OpenAI has chosen to keep this new feature a closely guarded secret, citing concerns about potential abuse and the proliferation of false content online.

Exceptional efficiency and precision

“Incredibly, our speech engine can produce emotional, lifelike sounds from just a 15-second sample,” the company said in a recent report. blog post.

Also read: Microsoft and OpenAI to launch $100 billion AI data center project to house ‘Stargate’ supercomputer

OpenAI’s speech engine and industry standards

In comparison, existing AI voice platforms such as Laboratory Eleven Typically longer samples are needed, and its instant voice cloning tool requires at least a minute of audio to run. For best results, it is recommended to speak continuously for approximately 10 minutes, especially for professional-level services.

OpenAI demonstrated the capabilities of its speech engine through various demonstrations, including a poignant example of using old recordings from a school project to replicate the voice of a young patient who had lost much of his ability to speak due to a brain tumor. The technology allows her to communicate using her voice, a feat made possible through a partnership with Lifespan, a nonprofit affiliated with Brown University School of Medicine.

Also Read: iOS 18 at WWDC 2024: Features, AI upgrades, release date, supported devices and more

Additionally, OpenAI revealed partnerships with: Heygendemonstrates how speech engines facilitate natural translation of speech from one language to another.

Also read: Apple may soon offer ‘Topography’ on iPhone, Macbook: What it is and all the details

According to OpenAI, the speech engine was originally developed in late 2022 and has been integrated into the preset voices provided in OpenAI’s text-to-speech API and ChatGPT’s speech and reading capabilities. With these latest developments, the company is proceeding with caution before a wider launch.

Follow us on Google news ,Twitter , and Join Whatsapp Group of thelocalreport.in

Surja, a dedicated blog writer and explorer of diverse topics, holds a Bachelor's degree in Science. Her writing journey unfolds as a fascinating exploration of knowledge and creativity.With a background in B.Sc, Surja brings a unique perspective to the world of blogging. Hers articles delve into a wide array of subjects, showcasing her versatility and passion for learning. Whether she's decoding scientific phenomena or sharing insights from her explorations, Surja's blogs reflect a commitment to making complex ideas accessible.