@BramVanroy thank you for getting back to me!!
The comment # shape: (batch size, sequence length, hidden dimension e.g. 768) is incorrect. Sorry about that - that’s my bad. One of the fields, the hidden states, has that shape, but the logits have shape # shape: (batch size, sequence length, vocab size e.g. 50000)
You’re right that I should’ve been more clear with my use case. What I want is to, during training, to log pairs of (the sentences that go into the model, the sentences that the model generates). Now, the model is outputting logits over words, so there’s not immediately a notion of “sentences that the model generates.”
Two approaches might be to take the logits, convert them to a distribution and (1) take argmax or (2) sample from the distribution. Maybe others have thought of much more clever approaches. My question is intended to learn what HuggingFace recommends.
Perhaps you could also confirm that the .generate() method isn’t relevant for converting the model’s output logits into string sentences during training?