Let’s say I have a T5ForConditionalGeneration
. Now I feed it a sequence (say, An apple a day keeps the doctor
). Now, if I want to get the probabilities for two words, say, day
and week
, how do I do it?
T5 uses an encoder model and decoder model. You give input text to the encoder. Outputs come from decoder layer. if you use model(input_ids=your_tokenized_input_text ,decoder_input_ids=start_token)
it will generate some tensor whose shape is [1,1,50127](ı couldn’t remember the correct number of tokens). this is the probabilities of all tokens for the next word. you can choose the highest probability token for the next token.
Then add it to start_token so your start_token will be something like [0, 7827]. Here, 0 is start token and ı added next token(7827 is random number ı wrote it for giving sample).
and then,for predicting next token write model(input_ids=your_tokenized_input_text ,decoder_input_ids=start_token+next_token)
this will give us a tensor whose shape is [1,2,50127]. We only want to get last token(which is second token) so ı will get predicted_token[:,-1,:]. And do argmax to get highest probability and add this new word to decoder_input_ids and again again if predicted token is 1 which is stop token break the loop and finish. That is how model.generate pipeline works.
This answer really helped me. Thank you!