How to join separate strings to translate them together for better speed?

I have a bunch of separate strings to translate between languages using MarianMTModel, the Helsinki-NLP/opus-mt-pl-en to be exact. These strings are separate and don’t have to be full sentences, one can be a word, another can be a phrase and another can be several sentences.

If I treat each separate element as a batch, and I have possibly hundreds of them, the translation process is very long but very accurate and overall good. Unfortunately too long. Therefore I try to join as many elements as I can (so that the cumulative length is lower than the model’s max_length). I decided to join them with such symbol <T> because later I need to split them because I need to target each element separately. So these 2:

  1. text one
  2. text two

becomes: "text one <T> text two" which then goes into the model.

Well, this doesn’t work particularly well for the shorter elements, some translations are terrible. I’m aware that the <T> symbol isn’t in the dictionary, but I tried adding new elements to the dictionary but that requires rerunning the model’s training which I cannot do right.

Has anyone had such a problem or has any ideas how to solve it? Maybe I’ve been doing something wrong with adding special symbols.