Handle long generation in text generation pipeline

jerrymeng100 · June 16, 2023, 9:32pm

Looking at the documentation on this - I’m not exactly sure how the ‘hole’ configuration works. So if I have max context of 2048 tokens, and my input is 2000 tokens. After it generates the first 48 tokens of the response and it still wants to keep going, would it start chopping off the first tokens from the prompt 1 by 1 as it generates each new token? Or how does it work? Want to make use of this as a catch-all for long prompts/responses/both, but want to make sure it’s not doing something like chopping off prompt tokens unnecessarily before it needs to.

Topic		Replies	Views
Issue with max_length 🤗Transformers	1	2464	September 27, 2020
How does the pipeline deal with too long sequences? Beginners	3	85	January 17, 2025
Generate() and automatic truncation of context 🤗Transformers	0	123	June 13, 2024
Tokenizer behaviour with pipeline 🤗Tokenizers	0	919	August 1, 2023
Output Includes Input Beginners	3	1847	September 29, 2022

Handle long generation in text generation pipeline

Related topics