How to convert model output logits into string sentences during training to check what the model is outputting?

I’m training GPT-2, constructed in the following manner:

                configuration = GPT2Config(
                architecture = AutoModelForCausalLM.from_config(

When I pass an input sequence into the model, like so, I can access its output logits:

            # Output type: BaseModelOutputWithPastAndCrossAttentions
            architecture_output = self.architecture(

How does one convert the logits into string sentences? I discovered the .generate() method, but this seems to generate new outputs.

I suppose I could convert the logits to a distribution, sample, convert to token ids and then use tokenizer.decode() but that seems too manual. What’s the right way to do this in HuggingFace?

It seems to me that these are the outputs of the base model. To get token predictions, you need the output of the LMHead (often a linear projection of hidden_dim → vocab_size). If you have the logits of shape bs, seqlen, vocab_size, you can simply do a softmax on that last dimension, select top1, and decode. This is not as much manual work as you may expect, but you do need the outputs of the LMHead.

As you said, you’d typically generate with generate, so I am not sure whether I understand your use case.

@BramVanroy thank you for getting back to me!!

The comment # shape: (batch size, sequence length, hidden dimension e.g. 768) is incorrect. Sorry about that - that’s my bad. One of the fields, the hidden states, has that shape, but the logits have shape # shape: (batch size, sequence length, vocab size e.g. 50000)

You’re right that I should’ve been more clear with my use case. What I want is to, during training, to log pairs of (the sentences that go into the model, the sentences that the model generates). Now, the model is outputting logits over words, so there’s not immediately a notion of “sentences that the model generates.”

Two approaches might be to take the logits, convert them to a distribution and (1) take argmax or (2) sample from the distribution. Maybe others have thought of much more clever approaches. My question is intended to learn what HuggingFace recommends.

Perhaps you could also confirm that the .generate() method isn’t relevant for converting the model’s output logits into string sentences during training?

@BramVanroy I removed the incorrect comment. Sorry again about that.

1 Like