Hello,
I had a quick question about properly generating text on multiple GPUs.
Right now I am generating texts by splitting GPT-j-6B across a few nodes.
Usually I set a fixed seed for my jobs. When I do that, the generated output text from each job is identical so it only makes sense to save one of them. I believe that is done in the last step of this deepspeed
example: deepspeed for GPT-Neo-2.7B
But when I don’t set an explicit seed each job gets different RNG settings. Thanks to this I instead get as many unique text outputs as there are GPUs.
Is not setting the shared seed a valid way to “cheat” and increase text generation throughput? Or should seeds be set for another reason (i.e. something in synced_gpus=True
or deepspeed needs it).
Apologies if this has been asked before I could not find it via keywords.
Thank you!