PreTrain Wav2Vec2 in German

patrickvonplaten · June 28, 2021, 8:55pm

PreTrain Wav2Vec2 in German

There is currently only a multilingually pretrained model for German Wav2Vec2. Let’s make a Wav2Vec2 only pretrained on German.

Model

A randomly initialized Wav2Vec2 model

Datasets

One can make use Common Voice the dataset is also available through the datasets library here: common_voice · Datasets at Hugging Face.

Available training scripts

FlaxWav2Vec2 will be merged soon: [Flax] Add wav2vec2 by patrickvonplaten · Pull Request #12271 · huggingface/transformers · GitHub and a pretraining script should be relatively easy to be merged.

(Optional) Desired project outcome

The best German ASR model.

(Optional) Challenges

It might make sense to use more data than just common voice.

patrickvonplaten · June 28, 2021, 8:56pm

I’d actually be happy to join this team

Also pinging @flozi00 @stefan-it in case you guys are interested.

Dimitre · June 28, 2021, 11:43pm

Hey @patrickvonplaten if you don’t mind I would like to join the team, I have been looking for an opportunity to get more hands-on experience with audio models, this sounds amazing if you wanna know a little more about my background check out my GitHub.

flozi00 · June 29, 2021, 7:20am

Hey Patrick,
sounds interesting.
Here is some stuff about more german data preprocessing/README.md · master · Jaco-Assistant / Scribosermo · GitLab

birgermoell · June 29, 2021, 7:52am

I would also like to join this project.

patrickvonplaten · June 29, 2021, 2:48pm

Awesome, let’s finalize this project! I’ll try to help out here

Dimitre · June 30, 2021, 1:41am

hey there, I took the liberty to create a channel at discord for this project (Flax-HuggingFace-Community-Week)

ghofrani · July 7, 2021, 12:55pm

Hi guys,
Could you please add me to this team?

Topic		Replies	Views
PreTrain Wav2Vec2 in Swedish Flax/JAX Projects	3	959	June 29, 2021
PreTrain Wav2Vec2 in Spanish Flax/JAX Projects	4	625	July 1, 2021
PreTrain Wav2Vec2 in Persian Flax/JAX Projects	0	1175	July 8, 2021
PreTrain Wav2Vec2 in Indonesian Flax/JAX Projects	1	366	June 29, 2021
PreTrain Wav2Vec2 in Turkish Flax/JAX Projects	1	414	July 2, 2021

PreTrain Wav2Vec2 in German

PreTrain Wav2Vec2 in German

Model

Datasets

Available training scripts

(Optional) Desired project outcome

(Optional) Challenges

Related topics