I have youtube channel, and i want to speak, but my english is bad.
So i was think are there any ai or software where i can record what i speak and send to him. And he back improved speaking version without accent and mistakes?
I think the easiest way is to combine a TTS (Text-to-Sound) model with an STT (S2T, Sound-to-Text) model. In other words, you first convert the audio into text, and then have the AI read it out loud. There are also STS (S2S, Speech-to-Speech) models, but I think most of them come with translation functions. In this case, I think they are overkill. There also seems to be a way to use models for voice cloning, etc., but I don’t know the details.
I couldn’t find a good demo that had both functions, so I’ll introduce some well-known TTS and STT. Spaces has quite a few demos of the same type with different concepts, so some of them may already have the functions you’re looking for.
Thank you, sorry i am not sure i understand clearly you because of my english.
Why i will need sound to text?
I use a translation device for English, so I don’t think my English is very correct either.
Why i will need sound to text?
Ah, it’s just to save the trouble of typing English using a keyboard or flicking. With STT, you just have to speak. If you don’t mind typing instead of speaking, it’s really easy to just use TTS.
With high-performance TTS, it’s becoming so human-like and accurate that it’s hard to tell the difference from an announcer…
And compared to generating images and videos, they’re all very lightweight.
Are they ai where i can record me speaking and he mark me every word i told in wrong way?
I think you could do it if you incorporated an AI model for grammar checking (there are quite a few of these), but if you’re better at typing than speaking, it’s quicker to type… of course, you could also use flick input on your smartphone.
Or would it be quicker to speak in your native language and have it translated into English?
Well, if you want to do any of these things, you’ll need to program it yourself, so if you’re happy with text input, using TTS is probably the best option. There are lots of TTS systems out there, so you can use whichever one you like. Basically, using Spaces is free.
thank you, also what i want to ask is if there ai that i can speak and he told me every word i speak in wrong way. So i will improve my english faster.
There are some, but there aren’t many free ones. I think the reason is that it’s harder to provide them for free than standalone LLM or image generation. We use a combination of TTS, LLM and STT, so there is a cost for that…
There are also open-source LLM that understand speech, but there aren’t many.
Furthermore, if you try to make it free, there are still quite a few problems with the real-time nature of the response…
With regard to language learning, there are more and more services that use AI internally, so I think it would be good to use a trial version of one of these. You can train yourself to get the main points across, rather than just chatting for the sake of it.
Also, I’m not very familiar with it, but it might be a good idea to use the free quota of ChatGPT or other large AI services.
i try chatgpt off course but its not best for this, i ask him for services but he reccomend things like elevenlabs but idont see it will do what i want.
If you want to make a chatbot that responds with voice, the process would be something like the following, but the ChatGPT answers don’t have the crucial language model part…
If you only need the pronunciation function, Hugging Chat has it, so that would be fine.
Anyway, if it’s free, we can only make something with a huge time lag. We have the parts, but we don’t have the money to pay for the server. If you have a PC with a certain level of performance, you can build it locally like the github project below…