Compute log probabilities of any sequence provided

Before replying to the other points, I want to highlight the following:

:warning: Please do not use the method in this comment (cc @Seohyeong) to get scores for an arbitrary sequence.

Text generation is auto-regressive: it predicts the next tokens based on the tokens predicted so far. The scores field in the output contains the logits for all tokens in the vocabulary at each position. Yes, you can obtain a score for any token. But that score is only correct if the preceding tokens are exactly the same! In this colab you will see that if we change the model inputs (i.e. the source of the scores), their values will change for a few selected tokens.

:point_right: See here for an example on how to compute the token-level scores for any sequence :slight_smile:

2 Likes