Decoding a CLIP embedding

I can transform a text (prompt) into clip embeddings with:

prompt -> tokenizer -> tokens -> CLIPTextModel.from_pretrained -> embeddings

I would like to decode an embedding to a prompt:

embeddings -> ??? -> tokens -> tokenizer -> prompt

How do I convert CLIP embeddings into tokens?

I figured out that the CLIPTextModel is a lossy encoder, thus there is no direct way to decode an embedding.

1 Like