MarianMT training produce "▁" in results

Good morning,

I’m trying to fine-tune MarianMT model using parallel sentences (french-english and german-english at the moment).
While the standard model does not include any “strange” symbols in the translated sentences, after a few training iterations the model start to output this symbol → ▁ (it’s not the standard underscore).

The sentences are something like:
He also played three days later against Bolivia. -> Il▁joue aussi▁trois jours plus▁tard▁contre la Bolivie. or
Prenons un peu de recul et demandons-nous, pourquoi enseigne-t-on les maths? -> Take a little back and ask▁ourselves,▁Why are they teaching math?

The same character appears using German/English. Has anyone experienced the same? Am I missing something?

Thank you.

I’ve run into this recently as well. I believe it’s a SentencePiece artifact, and I’ve just been converting to space in post-processing, but if anyone can shed light on why this artifact occurs when fine-tuning, I’d be keen to learn more.

1 Like