Why does num_return_sequences > num_beams mean?

Hi!

I’m currently using the MairanMTModels to generate translations from English to another language. My goal is to generate 900 translations from a single English sentence.

I was reading this other blog post (How to generate text: using different decoding methods for language generation with Transformers) and it mentions:

" In transformers , we simply set the parameter num_return_sequences to the number of highest scoring beams that should be returned. Make sure though that num_return_sequences <= num_beams !"

Currently, I’m able to generate 900 translations with a beam size of 3 and I was wondering why is this possible? If I do a beam size of 900, I run into memory issues. I’m curious to know what’s allowing me to generate 900 sequences even though my beam size is much lower.

I appreciate the help!