Seq2seq decent predict but letter by letter instead of words

I’m running an inference snippet which is based on some seq2seq transformer model off of huggingface model

The code is technically the similar (aside from the custom db) to this.
This means that the pre and post process steps in the code is exactly the same. .

The tokenizer is the models (first link above) and the model is just the original but trained on my custom db (of course before running the predictions)

What is perplexing on running predictions i get this , which is not bad if we remove those special tokens <> and join the words… e.g. xss injection malicious is a pretty good keyphrase. Question: Why the letter by letter output and the separating special tokens?? I’m missing something fundamental here.

["<s><s><s>x", "s", "s<category>,", "i", "n", "j", "e", "c", "t", " ", "m", "a", "l", "i<category>c", "i<header>o", "u", "s<infill> ", "c<infill>o", "d", "e<header> ", "i<infill>n", "t<category>o", " <category>l", "o", "n<category>g", "e<category>r", " <infill>s", "u<category>p", "p", "o<category>rt", "e<infill>d", " <seealso>", "s<header>", "s<seealso> ", "f", "i<seealso>", "l<infill>e", " ", "v", "a<infill>", "m<infill>", ""]

["<s><s><s>x", "s", "s<category>,", "c", "r", "o", "s<infill>s", " ", "s<header>i", "t", "e", " <category>s", "c<category>r", "i", "p", "t<header>i<category>n", "g", ",", "i<category>m", "pr", "r<category>o", "p<category>e", "r<infill> ", "u", "s<seealso>e", "d", " <infill>i", "n", "p<infill>u", "t<category> ", "v", "a", "l", "i<infill>d", "a<infill>t", "i<header>", "n<category>", "a<seealso>", "m", "e<category>m<category>", "o<present>"]

There seems to be something skewed with the output(logits) of the trained model ( the existing huggingface model) on my data , before the inference (.predict() ) step .
They come out like this. Have to go check the .train() step and the fine-tuned model it generates.

so it turns out the custom dataset i used, was not properly “huggified” and not a proper json, on processing the strings it would add spaces between the letters and as such create a model trained on ‘spaced’ words.