Converting CLIPModel to VisionTextDualEncoderModel

fabiozappo · March 21, 2024, 9:08am

Hi HF team and thanks for your amazing work, for my research I would like to use the CLIPModel “openai/clip-vit-base-patch32” through the VisionTextDualEncoderModel class. I’ve tried to use the from_pretrained() method but it doesn’t support the clip_text_model class of the text tower.

Do you have any suggestion from where to start to create such a script?

nielsr · March 21, 2024, 12:45pm

cc @valhalla who might know this, since he worked on adding both CLIP and VisionTextDualEncoder to the Transformers library

Topic		Replies	Views
Attaching a vision decoder to VisionTextDualEncoder Models	0	263	May 10, 2023
Use custom text encoder in CLIP 🤗Transformers	3	1438	January 22, 2024
Can not run contrastive-image-text/run_clip.py Beginners	0	327	January 28, 2023
CLIP model incorporated in CLIPSeg 🤗Transformers	0	740	February 22, 2023
Converting weights to .safetensors with HF format -> CLIP-L is ruined. Why? Beginners	18	1220	September 21, 2024

Converting CLIPModel to VisionTextDualEncoderModel

Related topics