I want to use Whisper, a model for speech recognition but I have an issue. The problem is the way of people speak in my audio files is very specific so whisper encounters difficulties.
I had the idea to try to fine tune the model for better results but i don’t have enough data to do it. So my question is : is it possible to use a text to speech model to create a dataset to train a speech to text model ?
Thanks in advance.