Hey,
I noticed that some pipelines allow to exceed the 512 tokenlimitation.
For instance, trying to use a string with more than 35 000 tokens:
url = 'https://de.wikipedia.org/wiki/Gesch%C3%A4ftsbericht'
r = requests.get(url)
doc = r.text # more than 35 000 tokens!
question = "Wie teuer ist ein Geschäftsbericht?"
This works perfectly fine with the pipeline, without any truncation (since the answer is on the end of the string):
from transformers import pipeline
qa_pipeline = pipeline(
"question-answering",
model="deepset/gelectra-base-germanquad",
tokenizer="deepset/gelectra-base-germanquad"
)
qa_pipeline({
"context": doc,
'question': question
})
>>> [out]: 'Über 100 0000', score: 0.43
This does not work, if the model is loaded:
from transformers import AutoTokenizer, AutoModelForQuestionAnswering
import torch
tokenizer = AutoTokenizer.from_pretrained("deepset/gelectra-base-germanquad")
model = AutoModelForQuestionAnswering.from_pretrained("deepset/gelectra-base-germanquad")
encoding = tokenizer(question, doc, return_tensors="pt", max_length=10000)
outputs = model(**encoding)
>>> [out]: ...
'RuntimeError: The size of tensor a (35664) must match the size of tensor b (512) at non-singleton dimension 1'
I know that bert is limited to 512 tokens and need to be truncated. But why does that not apply to that pipeline?