What would be the best approach to create a new machine translation model for an entirely new native American language (Myaamia)?

bishalbaaniya · November 10, 2022, 2:58pm

I am working on a thesis to create a neural machine translation to translate the Myaamia language to English, it is just a unidirectional translation, i.e., we don’t need to go from Myaamia to English, only from English to Myaamia. AFAIK, Myaamia, being a lesser-known and used language, does not really match with any of the trained models on Huggingface.

I tried using some models available in Huggingface to try to see if I could get some results, but the results were not good at all. I achieved a BLEU score of 0.18 (out of 100) with the T5 model. Since I am (relatively) new to Huggingface and transformers, I would really appreciate it if anyone can guide me in the right direction. E.g., what model I should be using, and how should I format my data to get the best results?

Here is the link to the available data that I have so far:

Total: 61559
Train: 49247 (80%)
Test: 6156 (10%)
Validation: 6156 (10%)

Topic		Replies	Views
A service to translate datasets into other languages 🤗Datasets	1	860	June 6, 2023
Tuto on how to train a translation from scratch in a pythonic way? Beginners	2	618	October 23, 2023
Translation model to 100+ Languages Research	4	1887	January 25, 2025
How to make a translation dataset Beginners	3	2804	November 18, 2023
What architecture to use to classify english to japanese translations Beginners	2	645	July 14, 2023

What would be the best approach to create a new machine translation model for an entirely new native American language (Myaamia)?

Related topics