Why model.generate does encoding multiple times

Hi community, I have a question on the implementation of model.generate. From my understanding, an encoder-decoder-based model first encodes the input context, and then autoregressively decodes the outputs, which means it should only encode once and decodes multiple times. This blog post also supports my point of view. However, the detailed implementation of the high-level API model.generate is strange. At this line, the input_ids is appended with new decoded output, which is then sent to the entire encoder-decoder model. Is it on purpose? Thanks!

I think I got it. If it is an encoder-decoder based model, the algorithm would perform encoding first and save the result to model_kwargs, and at the meantime input_ids would always be the decoder’s output.