I am using a pretrained MarianMT machine translation model from English to German. I also have a large set of high quality English-to-German sentence pairs that I would like to use to enhance the performance of the model, which is trained on the OPUS corpus, but without making the model forget the OPUS training data. Is there a way to do that? Thanks.
Also on StackOverflow
You could further fine-tune it on your own corpus, and I think if you have a high quality dataset then it should improve the results after fine-tuning.
You can use the
finetune.py script from here for fine-tuning marian
If forgetting does turn out to be a problem, you could do your fine-tuning with a mixture of your new data and the OPUS data.
Sorry can you re-share the link, the link doesn’t work