Rare buggy translations when using Helsinki-NLP models

Hi!

I’ve been translating some sentences using pretrained Helsinki-NLP models: Helsinki-NLP/opus-mt-nl-en · Hugging Face and Helsinki-NLP/opus-mt-en-nl · Hugging Face.

In rare cases, the generated translation will end with a long sequence of full stops. Often, this sequence will start from the unknown token in the source language sentence, but this is not a rule, as sometimes the sentence will contain no unknown tokens, and the translation will still be buggy. Also, many other sentences that contain the same unknown tokens will not produce those buggy translations. The problem usually happens before the end of the source sentence, in which case no other tokens will be translated after the full stops.

I’m working with a dataset that is not public, so cannot post too many examples, but here are some sentence parts and generated translations:

input

"The consultation paper introduced the need to establish « colleges » of supervisors"

output

'Het raadplegingsdocument introduceerde de noodzaak om een.............................................................................................................................................................................................'

input

'Inlichtingenformulier (*) nr. . . ., zoals bedoeld'

output

'Information document (*) No........................................................................'

In the first sentence, « is an unknown token, and nothing after it gets translated. In the second example, there are no unknown tokens in the input.

Interestingly, changing the input slightly can lead to a correct translation. In the second example, inputting

'Inlichtingenformulier (*) nr. . . ., om iets leuks te doen'

leads to correct output being generated:

"Information document (*) No.... to do something nice"

In the first example, using " instead of unknown tokens « and » will also lead to a correct translation. But, using the same unknown tokens in other sentences will also not raise the issue.

Has any of you encountered something similar and knows how to solve it? Alternatively, is there another place where I can raise this issue? Thanks in advance!