Hello, I am trying to fine-tune the T5 model by hand through PyTorch. I was hoping to extract the logits from the model output, however The output is of class Seq2SeqModelOutput and does not contain a logits paramater.
What it has is: last_hidden_state, past_key_values, decoder_hidden_states, decoder_attentions, cross_attentions, encoder_last_hidden_state, encoder_hidden_states, encoder_attentions.
Is there anyway for me to extract the logits from one of these parameters?
Hmm, could you help me reason through the shape of this tensor?? I would expect it to be of shape (batch_size, context_window_size, vocab_size) but the documentation says that it has shape of (batch_size, sequence_length, hidden_size).
last_hidden_state is NOT the logits tensor. If it was, it would have dimensions batch_size, sequence_length, vocab_size (as opposed to hidden size). My best guess is that it is the last output of the last transformer block BEFORE we apply the unembedding layer to it which would give us the logits. However, the model which I was originally using (T5Model) did not have an option to extract the unembedding layer. So I switched over to T5ForConditionalGeneration.