Extracting Logits From T5 Output

melembroucarlitos · February 12, 2023, 7:14pm

I figured it out… I think.

last_hidden_state is NOT the logits tensor. If it was, it would have dimensions batch_size, sequence_length, vocab_size (as opposed to hidden size). My best guess is that it is the last output of the last transformer block BEFORE we apply the unembedding layer to it which would give us the logits. However, the model which I was originally using (T5Model) did not have an option to extract the unembedding layer. So I switched over to T5ForConditionalGeneration.

Topic		Replies	Views
Untrained T5 model outputting logits that argmax to the decoder_input_ids Beginners	0	499	September 28, 2022
T5 models: About the decoder_input_ids argument Models	0	758	December 5, 2022
Questions about the shape of T5 logits Beginners	4	2590	September 23, 2021
T5 forward pass versus generate, latter outputs non-sense Beginners	8	2899	March 25, 2021
T5 Model Generate and Model Outputs Vastly Different Beginners	1	815	September 11, 2022

Extracting Logits From T5 Output

Related topics