Is it possible to generate more than one token when using a decoder only model via forward pass?

machaudry · February 15, 2024, 6:22pm

Hi guys. This is my snippet of the code:

model_name = 'facebook/opt-2.7b'
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name).to(device)

text = "Define the color of apple: "
for i in range(100):
    encoded_input = tokenizer(text, return_tensors='pt').to(device)
    output = model(**encoded_input)
    v_attns, max_idx = output.logits.max(dim=2)  # returns a tuple with (max_vals, indices)
    next_token = max_idx.cpu().numpy()[0]  
    text += tokenizer.decode(next_token)[-1]
print(text)

I am basically doing a forward with greedy to do next token prediction. Is it possible to predict more than one token in a forward pass? Also, is my approach above equivalent to using generate method with greedy sampling? I have noticed that I don’t get similar results for some models.

Thank you, again!

varadhbhatnagar · May 23, 2024, 6:12am

You will have to change the model architecture to predict more than one token in one time-step.

You might want to look at this recent paper: [2404.19737] Better & Faster Large Language Models via Multi-token Prediction

Topic		Replies	Views
Generating Once for 16 Tokens is Not Same Generating Single Token 16 Times? 🤗Transformers	4	278	April 17, 2024
Speculative Decoding: How to verify multiple tokens in a single forward pass? Beginners	0	337	January 4, 2024
Generate token by token for m2m100_418 Intermediate	0	387	February 6, 2024
Prevent repeat tokens in GPT2LMHeadModel text generation with max_new_tokens=1 Beginners	0	1115	November 19, 2021
Understanding Output of `PreTrainedModel.forward` Beginners	2	1896	February 12, 2024

Is it possible to generate more than one token when using a decoder only model via forward pass?

Related topics