Nahuatl: Fine-Tuning Wav2Vec

Hi there, I’m on my way to train a wav2vec2 Nahuatl (which is a one of the most known native language in MX), I have used a CC share alike no profit base dataset (hope that is OK?).

For the moment taking as base the example of wandb so being my first time trying to fine tune a model for ASR thanks for the base… hopefully something useful come out of this (don’t know if will get something “usable” at end).

Have also have requested it on Weights & Biases - Hugging Face xls.

Hi there this is the first model uploaded tyoc213/wav2vec2-large-xlsr-nahuatl · Hugging Face with wer of 69.11 (I hope to improve it).

Now that I have something working and knowing that Nahuatl is a binding language and it has long vowels so there is o and ooooo (double time, but can change meaning of a word) also to express it some ones write o:… and in some translations it was not writed because europeans didn’t know long vowels and couldn’t detect them so transcriptions don’t have it.

Do you know if in your language you have similar things, if so, how you handle?

  • long vowels annotation (make them é or how tou write it??)
  • binding of words affect this model at all?

My first 2 test when uploaded showed me this

said “amo tlen” which means “youare welcome”… I didn’t say the rest… I wonder what is that :slight_smile:?


The second one didn’t go that well "Tlazohcamati" (thanks)-> "'kilawtsokama tihss"

1 Like

My last model is 50.96 :’( haven’t tried other suggestions, but will check them soon.

Hi there people my model isn’t loading on the hosted inference API tyoc213/wav2vec2-large-xlsr-nahuatl · Hugging Face it has like a day that I try to test and shows this

{"error":"Model tyoc213/wav2vec2-large-xlsr-nahuatl is currently loading","estimated_time":25.24294062}

Hi @tyoc213 I just tested it and it works, it just need half a minute to load until it is ready to be tested again.

I see, I guess it is something like a docker img loading if not there?

yes, something like that. and if it is not used for some times, it will be unloaded again to save resources.

Hi @tyoc213 , thanks for building this model! One note though, the language code in the description is not correct. This data is from the municipality of Cuetzalan, and the Nahuatl spoken there is associated with the language code azz (Highland Puebla Nahuatl), not ncj (Northern Puebla Nahuatl). This is an understandable confusion because the naming is quite ambiguous, but I just wanted to point that out.

Thanks, will update it! But I thought it should be ncj because of the dataset I took “Audio corpus of Sierra Nororiental and Sierra Norte de Puebla Nahuat(l)” so I took it as north, wonder if it should be ncj+azz? (was also confused to just call it nah).

How has been your test? The last model from previous week should be around 40% wer.

The naming of the different variants is confusing, but since this data is from el municipio de Cuetzalan (see the “Deposit-Nahuatl-…docx” file), it should definitely be azz. You can read about details of the different nahuatl spoken in the Sierra de Puebla here: (PDF) A View from the Sierra: The Highland Puebla Area in Nahua Dialectology | Sasaki Mitsuya -

There is some audio included from the municipality of Tepetzintla, which is called “Nahuatl de la sierra oeste de Puebla” or ‘Nahuatl de Zacatlan-Ahuacatlan-Tepetzintla’ (code: nhi), but there aren’t any transcriptions for it.

I am still in the process of evaluating the model on my data, but I will let you know! I’m also working on a DeepSpeech ASR model with this same openslr data to compare.

Thanks for working on this, it is very helpful!!

1 Like

@tyoc213 the model definitely performs better than my expectations! Do you know how to decode using a language model with this API?