Different model.generate() predictions between batched and unbatched/padded token inputs

jesbu1 · August 2, 2022, 6:51pm

Hi,

I’m using the facebook/opt class of models for text completion.
When using padded and truncated, batched tokenized inputs, the model.generate() outputs are different than in the non-truncated case.

I’ve double checked that the padding tokens and attention masks for shorter inputs are used correctly.

The tokenizer and model are instantiated as following:

    tokenizer = AutoTokenizer.from_pretrained("facebook/opt-350m", use_fast=False)
    model = AutoModelForCausalLM.from_pretrained(
        "facebook/opt-1.3b", torch_dtype=torch.float16, device_map="auto",
    )

Tokenizer is tokenizing like this:

        input_prompts, padding=True, truncation=True, return_tensors="pt"
    )

model.generate(all_tokenized_prompts.input_ids, attention_mask=all_tokenized_prompts.attention_mask, do_sample=False)

Produces different output tokens when there’s prompts of different lengths in the input to the tokenizer. The output doesn’t just contain extra padding tokens, but actually produces drastically different predictions for the same prompt. I’ve checked the indicies match and that the input prompt is identical.

Any idea what’s going on here?

I see identical issues for all OPT models from 350M to 66B, and identical issues for when using GPT-J or GPT-NeoX-20B

Could it be that these are not using attention masks properly? I have verified that the attention_mask is used, as if i change the attention mask then the outputs will change.

Here’s a full code example to reproduce the issue:

modified_prompts = ["Succinctly predict the high-level task that the two sentences describe:\nDescription: First, turn around and walk to the cupboard under the sink. Then, open the right door and put the plunger inside the cupboard.\nTask: Put the plunger in the cupboard.\nDescription: First, put the forks on the table. Then, put the napkins on the table.\nTask: Set the table.\nDescription: First, turn to the right and pick up the cup. Then, pour the cup's contents into your mouth.\nTask: Drink water.\nDescription: First, open the gas tank. Then, pour in the gasoline.\nTask: Fill gas tank with gasoline.\nDescription: First, put coffee in the coffee machine. Then, press the brew button on the coffee machine.\nTask:", "Succinctly predict the high-level task that the two sentences describe:\nDescription: First, turn around and walk to the cupboard under the sink. Then, open the right door and put the plunger inside the cupboard.\nTask: Put the plunger in the cupboard.\nDescription: First, put the forks on the table. Then, put the napkins on the table.\nTask: Set the table.\nDescription: First, turn to the right and pick up the cup. Then, pour the cup's contents into your mouth.\nTask: Drink water.\nDescription: First, open the gas tank. Then, pour in the gasoline.\nTask: Fill gas tank with gasoline.\nDescription: First, scoop the dog poop. Then, put the scooped poop in the trash.\nTask:", "Succinctly predict the high-level task that the two sentences describe:\nDescription: First, turn around and walk to the cupboard under the sink. Then, open the right door and put the plunger inside the cupboard.\nTask: Put the plunger in the cupboard.\nDescription: First, put the forks on the table. Then, put the napkins on the table.\nTask: Set the table.\nDescription: First, turn to the right and pick up the cup. Then, pour the cup's contents into your mouth.\nTask: Drink water.\nDescription: First, open the gas tank. Then, pour in the gasoline.\nTask: Fill gas tank with gasoline.\nDescription: First, take the wrapper off the lollipop. Then, lick the lollipop.\nTask:", "Succinctly predict the high-level task that the two sentences describe:\nDescription: First, turn around and walk to the cupboard under the sink. Then, open the right door and put the plunger inside the cupboard.\nTask: Put the plunger in the cupboard.\nDescription: First, put the forks on the table. Then, put the napkins on the table.\nTask: Set the table.\nDescription: First, turn to the right and pick up the cup. Then, pour the cup's contents into your mouth.\nTask: Drink water.\nDescription: First, open the gas tank. Then, pour in the gasoline.\nTask: Fill gas tank with gasoline.\nDescription: First, open the microwave door. Then, take the food out of the microwave.\nTask:", "Succinctly predict the high-level task that the two sentences describe:\nDescription: First, turn around and walk to the cupboard under the sink. Then, open the right door and put the plunger inside the cupboard.\nTask: Put the plunger in the cupboard.\nDescription: First, put the forks on the table. Then, put the napkins on the table.\nTask: Set the table.\nDescription: First, turn to the right and pick up the cup. Then, pour the cup's contents into your mouth.\nTask: Drink water.\nDescription: First, open the gas tank. Then, pour in the gasoline.\nTask: Fill gas tank with gasoline.\nDescription: First, grab the fork. Then, drive the car.\nTask:", "Succinctly predict the high-level task that the two sentences describe:\nDescription: First, turn around and walk to the cupboard under the sink. Then, open the right door and put the plunger inside the cupboard.\nTask: Put the plunger in the cupboard.\nDescription: First, put the forks on the table. Then, put the napkins on the table.\nTask: Set the table.\nDescription: First, turn to the right and pick up the cup. Then, pour the cup's contents into your mouth.\nTask: Drink water.\nDescription: First, open the gas tank. Then, pour in the gasoline.\nTask: Fill gas tank with gasoline.\nDescription: First, eat the food. Then, grab the keys off the table countertop.\nTask:"]

device=0
tokenizer = AutoTokenizer.from_pretrained("facebook/opt-350m", use_fast=False)
model = AutoModelForCausalLM.from_pretrained(
        "facebook/opt-350m", torch_dtype=torch.float16, device_map="auto",
    )

print(tokenizer.batch_decode(model.generate(tokenizer(modified_prompts, padding=True, truncation=True, return_tensors="pt").input_ids[3:4].to(device), attention_mask=tokenizer(modified_prompts, padding=True, truncation=True, return_tensors="pt").attention_mask[3:4].to(device), max_new_tokens=15, do_sample=False), skip_special_tokens=True)[0][670:])

print(tokenizer.batch_decode(model.generate(tokenizer(modified_prompts[3:4], padding=True, truncation=True, return_tensors="pt").input_ids.to(device), attention_mask=tokenizer(modified_prompts[3:4], padding=True, truncation=True, return_tensors="pt").attention_mask.to(device), max_new_tokens=15, do_sample=False), skip_special_tokens=True)[0][670:])

Results in:

'Task: Open the microwave door.\nDescription: First, turn to the left and'

'Task:: Put the food in the microwave.\nDescription: First, turn to'

With the first, non-batched prediction being superior.

Marcushenriksboe · May 20, 2023, 10:27am

I also have this problen. Discovered while using BLIP2 and trying to batch, it seems only the longest example (that has no padding) is “kinda normal”, the other ones kinda turn out trash. Happened on OPT-2.7 and OPT-6.7

KHuss · August 26, 2023, 1:12am

Having this same problem training a gpt2 classifier. Trained it on right-padded sequences, but when I tried left-padding it took a performance hit. The effect of padding tokens is not completely eliminated by using attention masks set to 0, but how exactly they still affect the output is still not clear to me, so I am not sure how to fix it. If you solved it please share with the rest.

Topic		Replies	Views
Results of model.generate are different for different batch sizes of the decode-only model Beginners	6	5904	April 14, 2024
Auto vs. Model-specific classes and tokenizers 🤗Transformers	0	385	April 4, 2023
Logits from generate and model call different 🤗Transformers	2	885	January 26, 2025
Understanding Output of `PreTrainedModel.forward` Beginners	2	1855	February 12, 2024
The effect of padding_side 🤗Transformers	13	14252	May 27, 2025

Different model.generate() predictions between batched and unbatched/padded token inputs

Related topics