AI to improve voice

SENZACA · January 29, 2025, 2:58pm

I have youtube channel, and i want to speak, but my english is bad.
So i was think are there any ai or software where i can record what i speak and send to him. And he back improved speaking version without accent and mistakes?

John6666 · January 29, 2025, 4:12pm

I think the easiest way is to combine a TTS (Text-to-Sound) model with an STT (S2T, Sound-to-Text) model. In other words, you first convert the audio into text, and then have the AI read it out loud. There are also STS (S2S, Speech-to-Speech) models, but I think most of them come with translation functions. In this case, I think they are overkill. There also seems to be a way to use models for voice cloning, etc., but I don’t know the details.

I couldn’t find a good demo that had both functions, so I’ll introduce some well-known TTS and STT. Spaces has quite a few demos of the same type with different concepts, so some of them may already have the functions you’re looking for.

SENZACA · January 29, 2025, 7:21pm

Thank you, sorry i am not sure i understand clearly you because of my english.
Why i will need sound to text?

John6666 · January 30, 2025, 4:30am

I use a translation device for English, so I don’t think my English is very correct either.

Why i will need sound to text?

Ah, it’s just to save the trouble of typing English using a keyboard or flicking. With STT, you just have to speak. If you don’t mind typing instead of speaking, it’s really easy to just use TTS.
With high-performance TTS, it’s becoming so human-like and accurate that it’s hard to tell the difference from an announcer…
And compared to generating images and videos, they’re all very lightweight.

SENZACA · January 30, 2025, 3:47pm

Are they ai where i can record me speaking and he mark me every word i told in wrong way?

John6666 · January 30, 2025, 5:08pm

I think you could do it if you incorporated an AI model for grammar checking (there are quite a few of these), but if you’re better at typing than speaking, it’s quicker to type… of course, you could also use flick input on your smartphone.
Or would it be quicker to speak in your native language and have it translated into English?
Well, if you want to do any of these things, you’ll need to program it yourself, so if you’re happy with text input, using TTS is probably the best option. There are lots of TTS systems out there, so you can use whichever one you like. Basically, using Spaces is free.

SENZACA · February 2, 2025, 2:20am

thank you, also what i want to ask is if there ai that i can speak and he told me every word i speak in wrong way. So i will improve my english faster.

John6666 · February 2, 2025, 4:33am

There are some, but there aren’t many free ones. I think the reason is that it’s harder to provide them for free than standalone LLM or image generation. We use a combination of TTS, LLM and STT, so there is a cost for that…
There are also open-source LLM that understand speech, but there aren’t many.
Furthermore, if you try to make it free, there are still quite a few problems with the real-time nature of the response…

With regard to language learning, there are more and more services that use AI internally, so I think it would be good to use a trial version of one of these. You can train yourself to get the main points across, rather than just chatting for the sake of it.

Also, I’m not very familiar with it, but it might be a good idea to use the free quota of ChatGPT or other large AI services.

SENZACA · February 2, 2025, 9:36am

i try chatgpt off course but its not best for this, i ask him for services but he reccomend things like elevenlabs but idont see it will do what i want.

John6666 · February 2, 2025, 12:14pm

If you want to make a chatbot that responds with voice, the process would be something like the following, but the ChatGPT answers don’t have the crucial language model part…
If you only need the pronunciation function, Hugging Chat has it, so that would be fine.

Anyway, if it’s free, we can only make something with a huge time lag. We have the parts, but we don’t have the money to pay for the server. If you have a PC with a certain level of performance, you can build it locally like the github project below…

imjaceey · April 10, 2025, 4:01am

https://audiomodify.com/ try this tool

Kira0F · July 11, 2025, 6:23am

Another useful tool is Audiomodify. It offers high-quality TTS voices in-browser and is free, ideal for voice improvement demos or narration purposes.

svahab · July 20, 2025, 6:49am

If you’re looking for a simple online tool to quickly convert text to speech, you might find audiomodify helpful. You can just paste your text directly on the website and get the audio without needing any complex setup

Topic		Replies	Views
Speech to Text concern 🤗Transformers	0	385	August 27, 2023
Question Project STT - TTS - Sub translated Community Calls	0	491	September 3, 2023
AI Voice Assistant Beginners	2	140	April 19, 2025
Seeking guidance on building a text-to-speech AI with custom voice morphing Beginners	0	29	August 18, 2024
Chinese text to speech Models	0	506	April 18, 2024

Related topics