Token indices sequence length is longer than the specified maximum sequence length for this model

gensai · July 17, 2023, 1:35pm

I am trying to use the pre-trained model - “openai/clip-vit-large-patch14” for generating the text embeddings (reference code attached below) and running to the following error:

Token indices sequence length is longer than the specified maximum sequence length for this model (84 > 77).
The size of tensor a (84) must match the size of tensor b (77) at non-singleton dimension 1"

From the error message I understand that the sequence length for my input text is more than what the pre-trained model can handle (i.e 77). I looked into few threads asking to truncate the data but I want to check if there is any parameter that we can configure to let model handle the truncation procedure on its own.

Code:

from transformers import CLIPProcessor, CLIPModel
from PIL import Image
import requests

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)

model = CLIPModel.from_pretrained("openai/clip-vit-large-patch14")
processor = CLIPProcessor.from_pretrained("openai/clip-vit-large-patch14")

inputs = processor(text=<some large text>, images=image, return_tensors="pt", padding=True)
outputs = model(**inputs)
text_embeds = outputs['text_embeds']

gensai · July 21, 2023, 5:25pm

Update:

Looks like we can pass the truncation parameter to the processor to let model handle the truncation part.

from transformers import CLIPProcessor, CLIPModel
from PIL import Image
import requests

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)
model = CLIPModel.from_pretrained("openai/clip-vit-large-patch14")
processor = CLIPProcessor.from_pretrained("openai/clip-vit-large-patch14")

inputs = processor(text=<some large text>, images=image, return_tensors="pt", padding=True, truncation = True)
outputs = model(**inputs)
text_embeds = outputs['text_embeds']

Topic		Replies	Views
Predictions with pipeline fails to truncate test set 🤗Transformers	0	179	January 23, 2024
Truncating sequence -- within a pipeline Beginners	7	5737	May 3, 2024
Increasing pretrained CLIP max possible text sequence length 🤗Transformers	2	1450	November 7, 2024
How to specify sequence length when using "feature-extraction" 🤗Transformers	3	1295	April 28, 2021
Token indices sequence length is longer (Python) 🤗Transformers	0	342	April 13, 2023

Token indices sequence length is longer than the specified maximum sequence length for this model

Related topics