How to train the Translation Language Modeling (TLM) with transformers/examples/language-modeling/run_mlm.py?

SaltedFish · November 4, 2020, 8:52am

Hello, I have a question to ask for your help. I want to train the Translation Language Modeling (TLM) in XLM (Paper: Cross-lingual Language Model Pretraining). The translation language modeling (TLM) is very similar to the Masked Language Modeling (MLM), which only shows the difference in the form of input data. If I want to use the run_mlm.py file to achieve the effect of training the translation language modeling (TLM), can I just modify the composition of training data without modifying the source code of the transformers/examples/language-modeling/run_mlm.py file? Is this feasible?

For example, for the masked language modeling (MLM), one row of my training data is a language, as shown below:

( Row 1 ) polonium 's isotopes tend to decay with alpha or beta decay ( en ) .
( Row 2 ) 231 and penetrated the armour of the Panzer IV behind it ( en ) .
( Row 3 ) die Isotope von Polonium neigen dazu , mit dem Alpha- oder Beta-Zerfall zu zerfallen ( de ) .
( Row 4 ) 231 und durchbrach die Rüstung des Panzers IV hinter ihm ( de ) .
…

For the translation language modeling (TLM), my training data is a combination of two parallel corpora (It is to splice the above data in pairs. The separator is [/s] [/s].), as shown below:

( Row 1 ) polonium 's isotopes tend to decay with alpha or beta decay ( en ) . [/s] [/s] die Isotope von Polonium neigen dazu , mit dem Alpha- oder Beta-Zerfall zu zerfallen ( de ) .
( Row 2 ) 231 and penetrated the armour of the Panzer IV behind it ( en ) . [/s] [/s] 231 und durchbrach die Rüstung des Panzers IV hinter ihm ( de ) .
…

If I only modify the training data into a combination of two parallel corpora before executing the transformers/examples/language-modeling/run_mlm.py file, can I achieve the effect of training the translation language modeling (TLM)?

Looking forward to your help, thank you very much!

xukun · June 26, 2021, 7:32am

I have same confuse. Could tell me if you solve this issue？

xukun · June 26, 2021, 7:38am

I think it’s almost the same . The only gap should be the random mask by random mask. If the mask is applied on the special tokens some effects would occur. But i think this could be a positive effect as the model could distinguish the differences betwwen different language.

Topic		Replies	Views
Pretrained XLM model with TLM objective generates nonsensical predictions Models	0	539	June 15, 2021
Masked language modeling loss 🤗Transformers	1	4852	August 13, 2020
Missing examples in transformers/examples/language-modeling Beginners	2	528	April 22, 2021
Fine-tune BERT for Masked Language Modeling 🤗Transformers	3	3067	January 25, 2021
Difference between language modeling scripts Models	1	486	December 20, 2021

How to train the Translation Language Modeling (TLM) with transformers/examples/language-modeling/run_mlm.py?

Related topics