Hi guys. This is my snippet of the code:
model_name = 'facebook/opt-2.7b'
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name).to(device)
text = "Define the color of apple: "
for i in range(100):
encoded_input = tokenizer(text, return_tensors='pt').to(device)
output = model(**encoded_input)
v_attns, max_idx = output.logits.max(dim=2) # returns a tuple with (max_vals, indices)
next_token = max_idx.cpu().numpy()[0]
text += tokenizer.decode(next_token)[-1]
print(text)
I am basically doing a forward with greedy to do next token prediction. Is it possible to predict more than one token in a forward pass? Also, is my approach above equivalent to using generate
method with greedy sampling? I have noticed that I don’t get similar results for some models.
Thank you, again!