I often have a bit of trouble figuring out all the outputs, I thought I got it mostly after playing with hugging face for a few months now.
I came across this example in Lewis’ book again and I am not quite sure how this selects the last token:
output = model(input_ids=input_ids) # Select logits of the first batch and the last token and apply softmax next_token_logits = output.logits[0, -1, :]
It’s a causalLM model and from how I understood this should give the last hidden state from the first batch?
Wouldn’t the last token be
output.logits[0, -1, -1] ?
How am I understanding tokens and/or the output wrong?
Many thanks in advance for your help!
edit: Ahh, I overlooked hidden states are an optional output, so I guess this is just the logits, and first batch, last word in the sequence, all words in the vocab?