Nahuatl: Fine-Tuning Wav2Vec

tyoc213 · March 25, 2021, 9:43pm

Hi there, I’m on my way to train a wav2vec2 Nahuatl (which is a one of the most known native language in MX), I have used a CC share alike no profit base dataset (hope that is OK?).

For the moment taking as base the example of wandb so being my first time trying to fine tune a model for ASR thanks for the base… hopefully something useful come out of this (don’t know if will get something “usable” at end).

Have also have requested it on Weights & Biases - Hugging Face xls.

tyoc213 · March 26, 2021, 11:35pm

Hi there this is the first model uploaded tyoc213/wav2vec2-large-xlsr-nahuatl · Hugging Face with wer of 69.11 (I hope to improve it).

Now that I have something working and knowing that Nahuatl is a binding language and it has long vowels so there is o and ooooo (double time, but can change meaning of a word) also to express it some ones write o:… and in some translations it was not writed because europeans didn’t know long vowels and couldn’t detect them so transcriptions don’t have it.

Do you know if in your language you have similar things, if so, how you handle?

long vowels annotation (make them é or how tou write it??)
binding of words affect this model at all?

My first 2 test when uploaded showed me this

said “amo tlen” which means “youare welcome”… I didn’t say the rest… I wonder what is that ?

The second one didn’t go that well "Tlazohcamati" (thanks)-> "'kilawtsokama tihss"

tyoc213 · March 29, 2021, 3:16am

My last model is 50.96 :’( haven’t tried other suggestions, but will check them soon.

tyoc213 · April 8, 2021, 3:52pm

Hi there people my model isn’t loading on the hosted inference API tyoc213/wav2vec2-large-xlsr-nahuatl · Hugging Face it has like a day that I try to test and shows this

{"error":"Model tyoc213/wav2vec2-large-xlsr-nahuatl is currently loading","estimated_time":25.24294062}

cahya · April 8, 2021, 5:18pm

Hi @tyoc213 I just tested it and it works, it just need half a minute to load until it is ready to be tested again.

tyoc213 · April 9, 2021, 3:30am

I see, I guess it is something like a docker img loading if not there?

cahya · April 9, 2021, 8:27am

yes, something like that. and if it is not used for some times, it will be unloaded again to save resources.

Lguyogiro · April 14, 2021, 3:57am

Hi @tyoc213 , thanks for building this model! One note though, the language code in the description is not correct. This data is from the municipality of Cuetzalan, and the Nahuatl spoken there is associated with the language code azz (Highland Puebla Nahuatl), not ncj (Northern Puebla Nahuatl). This is an understandable confusion because the naming is quite ambiguous, but I just wanted to point that out.

tyoc213 · April 14, 2021, 4:13pm

Thanks, will update it! But I thought it should be ncj because of the dataset openslr.org I took “Audio corpus of Sierra Nororiental and Sierra Norte de Puebla Nahuat(l)” so I took it as north, wonder if it should be ncj+azz? (was also confused to just call it nah).

How has been your test? The last model from previous week should be around 40% wer.

Lguyogiro · April 14, 2021, 8:20pm

The naming of the different variants is confusing, but since this data is from el municipio de Cuetzalan (see the “Deposit-Nahuatl-…docx” file), it should definitely be azz. You can read about details of the different nahuatl spoken in the Sierra de Puebla here: (PDF) A View from the Sierra: The Highland Puebla Area in Nahua Dialectology | Sasaki Mitsuya - Academia.edu

There is some audio included from the municipality of Tepetzintla, which is called “Nahuatl de la sierra oeste de Puebla” or ‘Nahuatl de Zacatlan-Ahuacatlan-Tepetzintla’ (code: nhi), but there aren’t any transcriptions for it.

I am still in the process of evaluating the model on my data, but I will let you know! I’m also working on a DeepSpeech ASR model with this same openslr data to compare.

Thanks for working on this, it is very helpful!!

Lguyogiro · April 15, 2021, 6:43am

@tyoc213 the model definitely performs better than my expectations! Do you know how to decode using a language model with this API?

tyoc213 · May 3, 2021, 12:41am

Congrats!

I think you only need to upload the output model that can predict and can be loaded and it will mostly work, if you see, the model only gets input data in a sound file and outputs the predicted string and the API will just load it as is mostly.

Don’t know if there are any other extra requeriment, but will try to look on it soon.

Topic		Replies	Views
[Open-to-the-community] XLSR-Wav2Vec2 Fine-Tuning Week for Low-Resource Languages Languages at Hugging Face	411	17416	December 9, 2021
Hindi ASR: Fine-Tuning Wav2Vec2 Languages at Hugging Face	19	3009	January 4, 2022
Thai ASR: Fine-Tuning Wav2Vec2 Languages at Hugging Face	0	1022	March 18, 2021
Kannada ASR: Fine-Tuning Wav2Vec2 Languages at Hugging Face	1	1048	March 22, 2021
Arabic ASR: Fine-Tuning Wav2Vec2 Languages at Hugging Face	3	2281	December 27, 2024

Nahuatl: Fine-Tuning Wav2Vec

Related topics