Prevent repeat tokens in GPT2LMHeadModel text generation with max_new_tokens=1

Hi! I’m currently exploring some of the transformer libs capabilities and had a question about the model.generate() method.

I’m using some implementation like this:

    output_sequences = model.generate(
        input_ids=input_ids,
        top_k=40,
        top_p=0.9,
        max_new_tokens=1,
        do_sample=True,
        num_return_sequences=25,
        return_dict_in_generate=True,
        output_scores=True
    )

    predictions = [
        dict(
            w = tokenizer.decode(output_sequences.sequences[i][-1]),
            p = ...score calculation...
    ) for i in range(output_sequences["sequences"].shape[0])]

So that given some prompt I get such a response:

input_prompt = "Hello there, how are"
#...Tokenize inputs...
#...generate...
predictions = [
    { w: " you", p: some_score1 }
    { w: " things", p: some_score2 }
    { w: " you", p: some_score3 }
    #...etc
]

The only issue is that I get repeat tokens as shown above… Is there any way to ensure I receive unique tokens back from the generation? I did not have this issue when implementing beam search, but this method is too slow for my application.