I am using GPT2 as the text generator for a video captioning model so instead of feeding GPT2 with token ids, I’m directly giving the video embeddings via input_embeds
parameters.
Now during inference, to get the sentence predictions as output, I’m trying to use the .generate()
function of GPT2 but I see that it only takes the token ids as inputs. Is there a way to give it the embeddings directly?