Reusing cached context to generate multiple sequences?

kz919 · March 26, 2023, 5:49am

I wonder if it’s possible to make generate() use cached context to generate multiple sequences without iteratively computing the context, especially for large model that can only run batch size of 1, hence I can not use num_output_sequence, which is effectively increasing the batch size.

The current hack is to recomputing the context again everytime, which seems to be very unnecessary.

kz919 · March 26, 2023, 9:55pm

Never mind. I implemented it myself.

Topic		Replies	Views
Generate() and automatic truncation of context 🤗Transformers	0	123	June 13, 2024
Generating Once for 16 Tokens is Not Same Generating Single Token 16 Times? 🤗Transformers	4	279	April 17, 2024
Why model.generate does encoding multiple times 🤗Transformers	1	564	September 20, 2022
Caching encoder state for multiple encoder-decoder `.generate()` calls? 🤗Transformers	2	241	April 12, 2024
Batch_decode does not give the correct output as generate 🤗Transformers	0	299	March 17, 2022

Reusing cached context to generate multiple sequences?

Related topics