Translation task for scarce language

I am working to develop model which will translate from English to the language about which there is not much translated data. So I am thinking to pretrain model and then fine-tune on translation task. I read “attention is all you need” paper and concluded that they don’t use pretraining, which seems necessary if data is scarce. I am wondering if you know any paper, article or anything which will help me to acquire more knowledge about that topic. feel free to give as many suggestions as possible.

Maybe you should try this model: facebook/m2m100_418M · Hugging Face