[Announcement] Generation: Get probabilities for generated output

@joaogante I got some more time to work on this issue again. If we wanted to calculate token log probs for AutoModelForSeq2SeqLM (I used flan t5), do the pairings between the probabilities and the tokens have to be shifted as well? In this case, the shifting is done internally when the labels are shifted right for the decoder input, right? This is the biggest selling point for token log probs API because one has to get these pairings correctly for all architectures. I don’t get high logits for obvious words in our test sentences, so I suspect the code I provided is still incorrect. Any ideas about what am I doing wrong?