How does the pipeline deal with too long sequences?

stefanbschneider · January 10, 2025, 11:17am

I’ve been playing around with Text2Text Generation for generative Q&A, using the FLAN-T5. For example:

generative_qa_t5 = pipeline(task="text2text-generation", model="google/flan-t5-base")

full_article = ... # Full "Attention is all you need" paper (around 10k tokens)
question = "What's a transformer?"
input_text = f"{full_article} Given this context, please answer the following question. {question}"

print(generative_qa_t5(input_text))
# Outputs: "a model architecture relying entirely on self-attention to compute representations of its input and"

If I pass large inputs that are longer than the 512 max sequence length, I get a warning. Surprisingly, the generated answers are still quite good!

What happens under the hood within the pipeline to handle this overly long sequence?

I tried to split the full text into several parts of max 512 tokens and use them as separate contexts. The resulting answers were much worse than what was generated above. Is there a similar, maybe more sophisticated windowing logic implemented?

John6666 · January 10, 2025, 11:39am

It seems that this is the only pre-processing function in the pipeline. By default, it is set to pass the input to the Tokenizer without truncating it.

github.com

huggingface/transformers/blob/main/src/transformers/pipelines/text2text_generation.py#L123


      
                  forward_params["assistant_tokenizer"] = self.assistant_tokenizer
          
              return preprocess_params, forward_params, postprocess_params
          
          def check_inputs(self, input_length: int, min_length: int, max_length: int):
              """
              Checks whether there might be something wrong with given input with regard to the model.
              """
              return True
          
          def _parse_and_tokenize(self, *args, truncation):
              prefix = self.prefix if self.prefix is not None else ""
              if isinstance(args[0], list):
                  if self.tokenizer.pad_token_id is None:
                      raise ValueError("Please make sure that the tokenizer has a pad_token_id when using a batch input")
                  args = ([prefix + arg for arg in args[0]],)
                  padding = True
          
              elif isinstance(args[0], str):
                  args = (prefix + args[0],)
                  padding = False

stefanbschneider · January 17, 2025, 8:17am

Do I see it correctly that _parse_and_tokenize() does not implement any special functionality to handle overly long sequences? It just passes the full sequence without truncation (by default) to the tokenizer. I suppose the tokenizer doesn’t care about the sequence length, but how can the model handle overly long tokenized sequence?

I still don’t understand how/why passing a too long sequence works without crashing. I get a warning but still a sensible answer.

John6666 · January 17, 2025, 9:02am

Looking at the code in the pipeline, it seems like that… However, the pipeline contains everything that is needed, so it is quite difficult to accurately follow the code. There may be something I have missed.
It seems like a truncation or similar process is needed somewhere, so the default settings of the tokenizer itself in that model may be excellent and be processing it in a good way…

Topic		Replies	Views
Truncating sequence -- within a pipeline Beginners	7	5788	May 3, 2024
Tokenizer behaviour with pipeline 🤗Tokenizers	0	921	August 1, 2023
Summarization pipeline on long text Beginners	6	4505	December 14, 2022
Confused about max_length and max_new_tokens 🤗Transformers	7	36136	September 5, 2024
Why do Pipelines allow more than 512 tokens? Beginners	1	631	April 4, 2023

How does the pipeline deal with too long sequences?

Related topics