Obtain patch embeddings with CLIP

Hello,

I want to get the embeddings for each patch (dim = 512, not 768) of the image, how can I achieve this? should I just multiply them with the projection matrix or there are some normalization step that I have to do?

Thank you

1 Like