Handle long generation in text generation pipeline

Looking at the documentation on this - I’m not exactly sure how the ‘hole’ configuration works. So if I have max context of 2048 tokens, and my input is 2000 tokens. After it generates the first 48 tokens of the response and it still wants to keep going, would it start chopping off the first tokens from the prompt 1 by 1 as it generates each new token? Or how does it work? Want to make use of this as a catch-all for long prompts/responses/both, but want to make sure it’s not doing something like chopping off prompt tokens unnecessarily before it needs to.