Hi all! This is my first topic here, so apologies in case I make some errors.
Currently I am working on creating custom word embeddings for an Indian language, Marathi. They will be later used for creating a NMT model for translation between Marathi and English.
How to do so using transformers? Also, what is the required data cleaning process?