Setting seed within model.generate()

Hi! I would like to use the model.generate() method from a pipeline to generate n_responses, as displayed here:

def generate_responses(batch, n_generations):
    global pipeline_instance
    
    # Apply the pipeline with n_generations outputs per input
    batch_outputs = pipeline_instance(
        batch["prompt"], num_return_sequences=n_generations, 
        return_full_text=False, batch_size=len(batch["text"]),
        pad_token_id=pipeline_instance.tokenizer.eos_token_id, max_new_tokens=256
    )
    
    # Structure the output to match dataset format: list of dicts with "generated_text" field
    batch["generated_text"] = [
        [output["generated_text"] for output in outputs] for outputs in batch_outputs
    ]
    return batch

A constraint I would like to apply to my generation, is that for every generation N’th generation, the seed is set to N-1 (e.g., for the fifth generated response, the seed would equal 4). I would like to encourage diversity this way, as inspired by this paper.

The problem is that I couldn’t find a way to do so through model.generate(). Any tips on how to do this properly?

In case this isn’t possible, I may resort to simpler ways of encouraging variation, such as temperature or sampling settings.

1 Like