Increasing pretrained CLIP max possible text sequence length

Hey! :wave: I’m quite new to this platform and I’ve been using it to work on my Msc Thesis. I’m wondering if it is possible to use a pretrained CLIP model but change how long the max sequence length is. In particular, the pretrained CLIP model I’m using is CLIP-RSCID and I want to increase the maximum sequence length that the model can receive as input - currently it’s 77. Ideally I want to keep the current model weights and simply add random weights for the new columns in the updated weights matrixes. I’ve tried to look around the CLIP documentation and couldn’t find anything that allows to keep the pretrained weights whilst changing the configuration but as I said I’m a beginner. Any help would be greatly appreciated!

Did you ever figure out a solution to this? Running into the same issue for max_position_embeddings

We can use Long-clip model to solve this issue.
In this it will give you max_position_embeddings = 256 which is more than 77 provided by clip.
Following code will may help you.

from transformers import CLIPModel, CLIPProcessor, CLIPConfig

model_id = (“zer0int/LongCLIP-GmP-ViT-L-14”)
config = CLIPConfig.from_pretrained(model_id)
config.text_config.max_position_embeddings = 248

model = CLIPModel.from_pretrained(model_id,config=config)
processor = CLIPProcessor.from_pretrained(model_id,config=config)

This may help you with problem.

1 Like