Japanese ASR: Fine-Tuning Wav2vec2

AndrewMcDowell · January 28, 2022, 2:44pm

Has anyone had any luck training a Wav2Vec2 model with Japanese (or any language with a large number of characters in the alphabet like Chinese)?

I was interested to see how a naive approach would perform so tried the standard training script on a fairly powerful box using the Common Voice Japanese dataset and always hit OOM errors on the first Eval stage. The dataset uses Kanji for the labelled text sections, so the resulting vocab is pretty big which I suspect is the problem, but might also make a resulting model perform badly due to the different pronunciations a single Kanji can carry in different contexts.

For those not familiar with Japanese, it uses a mixture of two phonetic alphabets, Hiragana and Katakana (the second mostly used for phonetically representing foreign imported words) where the characters have specific sounds, and Kanji which can have different pronunciations depending on context (here is a nice example).

There are 46 characters each in Hiragana and Katakana (along with accent type marks that extend it to 71 distinct sounds) while there are tens of thousands of different Kanji.

I was thinking of seeing whether I could first map the texts to a Hiragana and Katakana alphabet and train the model to produce a transcript only using the phonetic characters and then perhaps use a language model to try and convert them back to Kanji but I was interested if anyone else had tried approaching this problem.

I see there’s a few libraries out there for handling the text conversion.

for example.

Trying to train some other language models at the moment, but will give this approach a go and update if I can get it working.

Topic		Replies	Views
Wav2vec2 not converging when finetuning 🤗Transformers	7	2581	June 15, 2021
Spanish ASR: Fine-Tuning Wav2Vec2 Languages at Hugging Face	9	2999	March 26, 2021
Thai ASR: Fine-Tuning Wav2Vec2 Languages at Hugging Face	0	1027	March 18, 2021
Pretrain Wav2vec2 in Russian Flax/JAX Projects	2	1103	July 1, 2021
Polish ASR: Fine-Tuning Wav2Vec2 Languages at Hugging Face	0	434	March 19, 2021

Japanese ASR: Fine-Tuning Wav2vec2

Related topics