How to create word embeddings for non-English languages using BERT-like models?

Hi all! This is my first topic here, so apologies in case I make some errors.

Currently I am working on creating custom word embeddings for an Indian language, Marathi. They will be later used for creating a NMT model for translation between Marathi and English.

How to do so using transformers? Also, what is the required data cleaning process?