Why do Pipelines allow more than 512 tokens?


I noticed that some pipelines allow to exceed the 512 tokenlimitation.

For instance, trying to use a string with more than 35 000 tokens:

url = 'https://de.wikipedia.org/wiki/Gesch%C3%A4ftsbericht'
r = requests.get(url)
doc = r.text # more than 35 000 tokens!
question = "Wie teuer ist ein Geschäftsbericht?"

This works perfectly fine with the pipeline, without any truncation (since the answer is on the end of the string):

from transformers import pipeline

qa_pipeline = pipeline(

    "context": doc,
    'question': question

>>> [out]: 'Über 100 0000', score: 0.43

This does not work, if the model is loaded:

from transformers import AutoTokenizer, AutoModelForQuestionAnswering
import torch

tokenizer = AutoTokenizer.from_pretrained("deepset/gelectra-base-germanquad")

model = AutoModelForQuestionAnswering.from_pretrained("deepset/gelectra-base-germanquad")
encoding = tokenizer(question, doc, return_tensors="pt", max_length=10000)
outputs = model(**encoding)

>>> [out]: ...
'RuntimeError: The size of tensor a (35664) must match the size of tensor b (512) at non-singleton dimension 1'

I know that bert is limited to 512 tokens and need to be truncated. But why does that not apply to that pipeline?

Hey @Oweys ! Do you have an explanation for this? I was wondering the same as I finetuned a Token Classification model with limitation of 1024 tokens BUT the pipeline object is processing and detecting tokens from the entire document!!