Using inputs_embeds as input for GPT2 generation_utils

Hi everyone :slight_smile:

I am currently working on building a system named ClipCap (CLIP Captions) at LAION, which takes in a CLIP embedding as an input and performs the reverse-dalle task of captioning the image. This system creates inputs_embeds using the CLIP embedding - so that when it comes to inference, it’s a bit of a pain to handle.

Is it possible to use inputs_embeds as an input in the generate() method? If not, could someone point me to any resources that could help me to recreate these methods from scratch so that I could use inputs_embeds as the input?

Many thanks in advance,

Hi, have you found a solution for this? I have the same problem here. Thanks!