I figured it out… I think.
last_hidden_state is NOT the logits tensor. If it was, it would have dimensions batch_size, sequence_length, vocab_size (as opposed to hidden size). My best guess is that it is the last output of the last transformer block BEFORE we apply the unembedding layer to it which would give us the logits. However, the model which I was originally using (T5Model) did not have an option to extract the unembedding layer. So I switched over to T5ForConditionalGeneration.