Extracting Logits From T5 Output

Hello, I am trying to fine-tune the T5 model by hand through PyTorch. I was hoping to extract the logits from the model output, however The output is of class Seq2SeqModelOutput and does not contain a logits paramater.

What it has is: last_hidden_state, past_key_values, decoder_hidden_states, decoder_attentions, cross_attentions, encoder_last_hidden_state, encoder_hidden_states, encoder_attentions.

Is there anyway for me to extract the logits from one of these parameters?

Here is the code that I am running

import torch

dummy_test = 'Hello World'
input_ids = tokenizer(dummy_test, truncation=False).input_ids

input = input_ids
input_decoded = tokenizer.decode(input, truncation=False)
label = input_ids[1:]
label_decoded = tokenizer.decode(label, truncation=False)

input_tensored = tokenizer(input_decoded, return_tensors='pt', padding=True, truncation=True).input_ids
label_tensored = tokenizer(label_decoded, return_tensors='pt', padding=True, truncation=True).input_ids

outputs = model(input_ids=input_tensored, decoder_input_ids=input_tensored)

last_hidden_state is the logits tensor

Hmm, could you help me reason through the shape of this tensor?? I would expect it to be of shape (batch_size, context_window_size, vocab_size) but the documentation says that it has shape of (batch_size, sequence_length, hidden_size).

I figured it out… I think.

last_hidden_state is NOT the logits tensor. If it was, it would have dimensions batch_size, sequence_length, vocab_size (as opposed to hidden size). My best guess is that it is the last output of the last transformer block BEFORE we apply the unembedding layer to it which would give us the logits. However, the model which I was originally using (T5Model) did not have an option to extract the unembedding layer. So I switched over to T5ForConditionalGeneration.

Sorry, outputs.last_hidden_state is the output from the decoder. See here for details

Hello! Can I ask how to extract the unembedding layer from T5ForConditionalGeneration?