Setting seed within model.generate()

timdadum · November 11, 2024, 2:19pm

Hi! I would like to use the model.generate() method from a pipeline to generate n_responses, as displayed here:

def generate_responses(batch, n_generations):
    global pipeline_instance
    
    # Apply the pipeline with n_generations outputs per input
    batch_outputs = pipeline_instance(
        batch["prompt"], num_return_sequences=n_generations, 
        return_full_text=False, batch_size=len(batch["text"]),
        pad_token_id=pipeline_instance.tokenizer.eos_token_id, max_new_tokens=256
    )
    
    # Structure the output to match dataset format: list of dicts with "generated_text" field
    batch["generated_text"] = [
        [output["generated_text"] for output in outputs] for outputs in batch_outputs
    ]
    return batch

A constraint I would like to apply to my generation, is that for every generation N’th generation, the seed is set to N-1 (e.g., for the fifth generated response, the seed would equal 4). I would like to encourage diversity this way, as inspired by this paper.

The problem is that I couldn’t find a way to do so through model.generate(). Any tips on how to do this properly?

In case this isn’t possible, I may resort to simpler ways of encouraging variation, such as temperature or sampling settings.

Topic		Replies	Views
Difference between pipeline and model.generate? 🤗Transformers	2	2540	February 26, 2024
Generating Once for 16 Tokens is Not Same Generating Single Token 16 Times? 🤗Transformers	4	282	April 17, 2024
GPT2 Generated Output Always the Same? Beginners	3	5706	December 16, 2020
Pipeline vs model.generate() Beginners	11	13966	July 16, 2025
Can I change the paramaters for every return sequence in generate? Beginners	0	99	June 13, 2024

Setting seed within model.generate()

Related topics