How does the pipeline deal with too long sequences?

I’ve been playing around with Text2Text Generation for generative Q&A, using the FLAN-T5. For example:

generative_qa_t5 = pipeline(task="text2text-generation", model="google/flan-t5-base")

full_article = ... # Full "Attention is all you need" paper (around 10k tokens)
question = "What's a transformer?"
input_text = f"{full_article} Given this context, please answer the following question. {question}"

print(generative_qa_t5(input_text))
# Outputs: "a model architecture relying entirely on self-attention to compute representations of its input and"

If I pass large inputs that are longer than the 512 max sequence length, I get a warning. Surprisingly, the generated answers are still quite good!

What happens under the hood within the pipeline to handle this overly long sequence?

I tried to split the full text into several parts of max 512 tokens and use them as separate contexts. The resulting answers were much worse than what was generated above. Is there a similar, maybe more sophisticated windowing logic implemented?

1 Like

It seems that this is the only pre-processing function in the pipeline. By default, it is set to pass the input to the Tokenizer without truncating it.

Do I see it correctly that _parse_and_tokenize() does not implement any special functionality to handle overly long sequences? It just passes the full sequence without truncation (by default) to the tokenizer. I suppose the tokenizer doesn’t care about the sequence length, but how can the model handle overly long tokenized sequence?

I still don’t understand how/why passing a too long sequence works without crashing. I get a warning but still a sensible answer.

1 Like

Looking at the code in the pipeline, it seems like that… However, the pipeline contains everything that is needed, so it is quite difficult to accurately follow the code. There may be something I have missed.
It seems like a truncation or similar process is needed somewhere, so the default settings of the tokenizer itself in that model may be excellent and be processing it in a good way…