Token indices sequence length is longer than the specified maximum sequence length for this model

I am trying to use the pre-trained model - “openai/clip-vit-large-patch14” for generating the text embeddings (reference code attached below) and running to the following error:

Token indices sequence length is longer than the specified maximum sequence length for this model (84 > 77).
The size of tensor a (84) must match the size of tensor b (77) at non-singleton dimension 1"

From the error message I understand that the sequence length for my input text is more than what the pre-trained model can handle (i.e 77). I looked into few threads asking to truncate the data but I want to check if there is any parameter that we can configure to let model handle the truncation procedure on its own.


from transformers import CLIPProcessor, CLIPModel
from PIL import Image
import requests

url = ""
image =, stream=True).raw)

model = CLIPModel.from_pretrained("openai/clip-vit-large-patch14")
processor = CLIPProcessor.from_pretrained("openai/clip-vit-large-patch14")

inputs = processor(text=<some large text>, images=image, return_tensors="pt", padding=True)
outputs = model(**inputs)
text_embeds = outputs['text_embeds']


Looks like we can pass the truncation parameter to the processor to let model handle the truncation part.

from transformers import CLIPProcessor, CLIPModel
from PIL import Image
import requests

url = ""
image =, stream=True).raw)
model = CLIPModel.from_pretrained("openai/clip-vit-large-patch14")
processor = CLIPProcessor.from_pretrained("openai/clip-vit-large-patch14")

inputs = processor(text=<some large text>, images=image, return_tensors="pt", padding=True, truncation = True)
outputs = model(**inputs)
text_embeds = outputs['text_embeds']