Force mBART to generate tokens in target language during backtranslation

sergisdeutsch · March 22, 2021, 10:35am

Hi there,

When fine-tuning mBART for translation using on-the-gly backtranslation, the paper states that they force the model in the first 1000 steps to generate tokens only on the target language (to avoid simply copying the source text). Specifically, they “mask out the output probability of predicting tokens
which appear less than 1% in the target monolingual corpus”

Any idea on how to do this using the huggingface library?

Thank you!

Topic		Replies	Views
How to constrain mBart decoding to generate English-only output? 🤗Transformers	0	419	August 31, 2022
How to train mBart or any multilingual model for translation task Beginners	0	254	January 4, 2023
Question about Multilingual Tokenizers expected behaviours Beginners	0	326	July 13, 2022
How to prepare data for mBART50 multilingual (many-to-many) fine-tuning? Models	1	19	June 17, 2025
Facebook mbart multilingual translation Beginners	0	499	February 1, 2023

Force mBART to generate tokens in target language during backtranslation

Related topics