PreTrain Wav2Vec2 in German
There is currently only a multilingually pretrained model for German Wav2Vec2. Let’s make a Wav2Vec2 only pretrained on German.
Model
A randomly initialized Wav2Vec2 model
Datasets
One can make use Common Voice the dataset is also available through the datasets
library here: common_voice · Datasets at Hugging Face.
Available training scripts
FlaxWav2Vec2 will be merged soon: [Flax] Add wav2vec2 by patrickvonplaten · Pull Request #12271 · huggingface/transformers · GitHub and a pretraining script should be relatively easy to be merged.
(Optional) Desired project outcome
The best German ASR model.
(Optional) Challenges
It might make sense to use more data than just common voice.
1 Like
I’d actually be happy to join this team 
Also pinging @flozi00 @stefan-it in case you guys are interested.
Hey @patrickvonplaten if you don’t mind I would like to join the team, I have been looking for an opportunity to get more hands-on experience with audio models, this sounds amazing if you wanna know a little more about my background check out my GitHub.
Hey Patrick,
sounds interesting.
Here is some stuff about more german data preprocessing/README.md · master · Jaco-Assistant / Scribosermo · GitLab
I would also like to join this project.
1 Like
Awesome, let’s finalize this project! I’ll try to help out here 
hey there, I took the liberty to create a channel at discord for this project (Flax-HuggingFace-Community-Week)
1 Like
Hi guys,
Could you please add me to this team?