How to build a model that transcript our local language audio to french text?

I would like to build an audio to text model where the audio files are spoken into our local language (Guéré for example). However, this language does not have a basic alphabet at the moment. To overcome the problem, each audio file has its corresponding text in french. Then, I would like to use Wav2vec library to build the model. Is it a good idea? Would you have any solution for me?

Thanks in advance