Hi, I’m fine-tuning the google/mt5-base model for many2many translation. The issue is that I do not see any language tag available in the model config (facebook/mbart-large-50 has these included). Normally when generate I would use the forced_bos_token_id (e.g. with mbart-large-50) to force generation of the target language. Are there any pre-defined language codes for mt5, or do I have to add additional special tokens myself ?
Related Topics
Topic | Replies | Views | Activity | |
---|---|---|---|---|
Can we force first token by model.config.forced_bos_token_id? | 0 | 582 | April 12, 2022 | |
How to constrain mBart decoding to generate English-only output? | 0 | 390 | August 31, 2022 | |
Force mBART to generate tokens in target language during backtranslation | 0 | 482 | March 22, 2021 | |
MBART-50 looks not compatible with pipeline | 0 | 39 | July 10, 2024 | |
Generate token by token for m2m100_418 | 0 | 273 | February 6, 2024 |