On the previous community event I pretrained a Turkish Wav2Vec2 model using the fairseq script. This is a subpar model because I hadn’t cleaned the data.
This time I want to do it properly with the freshly merged FlaxWav2Vec2 + PreTraining script
A randomly initialized Wav2Vec2 model
FlaxWav2Vec2 will be merged soon: [Flax] Add wav2vec2 by patrickvonplaten · Pull Request #12271 · huggingface/transformers · GitHub and a pretraining script should be relatively easy to be merged.
The best Turkish ASR model.
I have some additional scraped audiobook data. Might need a bit more though.