The screenshot is the bloom output with a prompt “Please unscramble the letters into a word, and write that word:\nr e!c.i p r o.c a/l =”. (Bloom Book - a Hugging Face Space by bigscience, 2022-06-16, the frist sentence)
I think it makes sense that the model infers ‘The word is “RECIPROCAL”.\nThe’ after the prompt.
But what I can’t understand is why the token ‘word’ is the best choice for a next token after “Please unscramble the letters into a word, and write that word:\nr e!c.i p r o.c a/l = r e!c.i p r o.c a/l\nThe word is “RECIPROCAL”.\nThe”
This is unreasonable because in training dataset, I believe this type repetition is a rare case. Obviously, sequences after second 'word' barely occur in training dataset.
So, my question is
even if repetition of tokens (words, or sentences) is a rare case in datasets, why model acts like repetition is the best case.