I am trying to fine tune a translation model, but I want to try different tokenizers. So, this means I will not be using the same tokenizer for both languages. How to proceed with that in terms of the preprocessing function, the data collating and the seq2seq training?
Related Topics
Topic | Replies | Views | Activity | |
---|---|---|---|---|
Can i use a tokenizer x for a model y | 1 | 1458 | April 20, 2023 | |
Fine tuning a T5 model for translation - How do I apply my trained tokenizer to the target sentences? | 0 | 4 | July 20, 2024 | |
How to tokenize input if I plan to train a Machine Translation model. I'm having difficulties with text_pair argument of Tokenizer() | 4 | 1578 | November 4, 2021 | |
Customizing T5 tokenizer for finetuning | 1 | 477 | May 2, 2024 | |
Questions about the connection between tokenizer and the model | 0 | 275 | September 19, 2023 |