Hi there, I’m on my way to train a wav2vec2 Nahuatl (which is a one of the most known native language in MX), I have used a CC share alike no profit base dataset (hope that is OK?).
For the moment taking as base the example of wandb so being my first time trying to fine tune a model for ASR thanks for the base… hopefully something useful come out of this (don’t know if will get something “usable” at end).
Have also have requested it on Weights & Biases - Hugging Face xls.
Now that I have something working and knowing that Nahuatl is a binding language and it has long vowels so there is o and ooooo (double time, but can change meaning of a word) also to express it some ones write o:… and in some translations it was not writed because europeans didn’t know long vowels and couldn’t detect them so transcriptions don’t have it.
Do you know if in your language you have similar things, if so, how you handle?
long vowels annotation (make them é or how tou write it??)
binding of words affect this model at all?
My first 2 test when uploaded showed me this
said “amo tlen” which means “youare welcome”… I didn’t say the rest… I wonder what is that ?
The second one didn’t go that well "Tlazohcamati" (thanks)-> "'kilawtsokama tihss"
Hi @tyoc213 , thanks for building this model! One note though, the language code in the description is not correct. This data is from the municipality of Cuetzalan, and the Nahuatl spoken there is associated with the language code azz (Highland Puebla Nahuatl), not ncj (Northern Puebla Nahuatl). This is an understandable confusion because the naming is quite ambiguous, but I just wanted to point that out.
Thanks, will update it! But I thought it should be ncj because of the dataset openslr.org I took “Audio corpus of Sierra Nororiental and Sierra Norte de Puebla Nahuat(l)” so I took it as north, wonder if it should be ncj+azz? (was also confused to just call it nah).
How has been your test? The last model from previous week should be around 40% wer.
There is some audio included from the municipality of Tepetzintla, which is called “Nahuatl de la sierra oeste de Puebla” or ‘Nahuatl de Zacatlan-Ahuacatlan-Tepetzintla’ (code: nhi), but there aren’t any transcriptions for it.
I am still in the process of evaluating the model on my data, but I will let you know! I’m also working on a DeepSpeech ASR model with this same openslr data to compare.
I think you only need to upload the output model that can predict and can be loaded and it will mostly work, if you see, the model only gets input data in a sound file and outputs the predicted string and the API will just load it as is mostly.
Don’t know if there are any other extra requeriment, but will try to look on it soon.