How to find the common source of these text-to-speech voices?

NewHFuser · October 5, 2023, 12:00pm

I’m new to machine learning and to Hugging Face, so please tell me if I should be asking this question in a different place or in a different way. (I’m only allowed two links, so I’ve had to disable some links with “-dot-”.)

On two website for making video animations, Kreado (KreadoAI-dot-com) and [HeyGen (HeyGen-dot-com), there is a selection of text-to-speech voices. The user inputs a script in text and selects a voice, and the software generates the sound of that voice speaking the script. On both websites, the user has to be logged in to a free account in order to see the list of voices available. When one is logged in, Kreado offers 31 voices at KreadoAI-dot-com/ai/dubbing, while HeyGen has 92 at app.HeyGen-dot-com/voices. Each voice is identified by a unique name, such as Christopher and Elizabeth.

With so many voices, it can be hard to tell the difference between one and another. However, there is one voice that is very different from all the others, but seems to be the same voice on both sites. It is the voice of a little girl, and both sites call her Ana. For those people here who don’t have accounts on those sites, I’ve used the Windows Game Bar to save the samples each site provides of Ana’s voice:

It’s possible that both websites set their parameters for generating the voice of a little girl so similarly that their voices come out sounding the same, and it’s possible that both sites just happened to give their little-girl voices as well as 15 other voices the same names as the other site, but that seems highly unlikely. What is more likely is that both sites have obtained these voices from another source, or perhaps Kreado is licensing its smaller selection of voices from HeyGen.

I want to know the common source of these voices so that I can work directly with the technology instead of an implementation. I’m so new here that I don’t know if I’m asking about a model or a dataset or a space or something else. But is there a way to find the common source of those two voices?

NewHFuser · October 23, 2023, 10:59am

Two weeks later, I believe I’ve found the answer to my question.

First, about a week ago, I found another software that has the same Ana voice, which is ClipChamp, owned by Microsoft. That seemed interesting because MS certainly has enough AI muscle to be the source of development of a family of voices. But ClipChamp was not started by MS, but was acquired, so it was possible that the voices came from somewhere else before the acquisition.

That doubt is now gone. I found Ana in the text-to-speech facility of MS Azure, Microsoft’s cloud service.

Mystery solved. Kreado, HeyGen, and ClipChamp all source their voices from Microsoft.

I find it strange that all three of those voice services offer the voices without changing the names in order to hide their source and pretend their voices are unique. I’m guessing they use MS’s names for the voices because the licensing agreement with MS requires that.

HEXW1N2 · November 13, 2024, 11:41am

Thank you SO MUCH for actually putting in how you find it. I am so sick of seeing these posts from +1 year ago having the same exact question as me and then just putting “edit: i found it” with no further explanation

Topic		Replies	Views
Bounty: replicate a plattform like heygen Community Calls	1	301	July 16, 2024
Detecting who is speaking? Beginners	0	503	March 3, 2024
Seeking guidance on building a text-to-speech AI with custom voice morphing Beginners	0	25	August 18, 2024
Model.generate generates same output for different inputs 🤗Transformers	1	608	November 13, 2023
Text to Speech Alignment with Transformers Research	2	5508	April 20, 2022

How to find the common source of these text-to-speech voices?

Related topics