Hi! I’m currently exploring some of the transformer libs capabilities and had a question about the model.generate()
method.
I’m using some implementation like this:
output_sequences = model.generate(
input_ids=input_ids,
top_k=40,
top_p=0.9,
max_new_tokens=1,
do_sample=True,
num_return_sequences=25,
return_dict_in_generate=True,
output_scores=True
)
predictions = [
dict(
w = tokenizer.decode(output_sequences.sequences[i][-1]),
p = ...score calculation...
) for i in range(output_sequences["sequences"].shape[0])]
So that given some prompt I get such a response:
input_prompt = "Hello there, how are"
#...Tokenize inputs...
#...generate...
predictions = [
{ w: " you", p: some_score1 }
{ w: " things", p: some_score2 }
{ w: " you", p: some_score3 }
#...etc
]
The only issue is that I get repeat tokens as shown above… Is there any way to ensure I receive unique tokens back from the generation? I did not have this issue when implementing beam search, but this method is too slow for my application.