Reusing cached context to generate multiple sequences?

I wonder if it’s possible to make generate() use cached context to generate multiple sequences without iteratively computing the context, especially for large model that can only run batch size of 1, hence I can not use num_output_sequence, which is effectively increasing the batch size.

The current hack is to recomputing the context again everytime, which seems to be very unnecessary.

Never mind. I implemented it myself.