The output of T5 is not consistent on multiple sequences

LearnToGrow · May 10, 2022, 1:24pm

I am using T5 to summarize multiple sequences as a batch. Here I want to generate the output of model.generate(input_ids) by calling forward function (model(**inputs)). I know that forward() and generate() work completely different see this. To make them working the same way. I take some sequences and call model.generate() on them to generate the corresponding outputs and get pairs of (text, summary). Now, Calling the forward function on these pairs one each time generates the same outputs. However, when calling the forward function on batch of sequences, the output is not the same ? What I missed ?

# sequences
seq1 = "summarize: Calling the model (which means the forward method) uses the labels for teacher forcing. This means inputs to the decoder are the labels shifted by one"
output1 = "calling the model uses the labels for teacher forcing. inputs to the decoder"

seq2 = "summarize: When you call the generate method, the model is used in the autoregressive fashion"
output2 = "the model is used in the auto-aggressive fashion."

seq3 = "summarize: However, selecting the token is a hard decision, and the gradient cannot be propagated through this decision"
output3 = "the token is a hard decision, and the gradient cannot be propagated through this decision"

input_sequences = [seq1, seq2, seq3]
output_seq = [output1, output2, output3]

# encoding input and attention mask
encoding = tokenizer(
    input_sequences,
    padding="longest",
    max_length=128,
    truncation=True,
    return_tensors="pt",
)

input_ids, attention_mask = encoding.input_ids.to("cuda"), encoding.attention_mask.to("cuda")

# labels
target_encoding = tokenizer(
    output_seq, padding="longest", max_length=128, truncation=True
)
labels = target_encoding.input_ids
labels = torch.tensor(labels).to("cuda")
labels[labels == tokenizer.pad_token_id] = -100

# Call the models
logits = model(input_ids=input_ids, attention_mask=attention_mask, labels=labels).logits

# Apply softamx() and batch_decode()

X = logits
X = F.softmax(X, dim=-1)
ids = X.argmax(dim=-1)
y = tokenizer.batch_decode(sequences=ids, skip_special_tokens=True)

# results: batch_size=3

['call the model uses the labels for teacher forcing  inputs to the decoder are',
 'the model is used in the auto-aggressive fashion  the the the',
 'the token is a hard decision, and the gradient cannot be propagated through this decision ']

# results: batch_size =1 i.e. consider 1 seq each time

['call the model uses the labels for teacher forcing  inputs to the decoder are']

['the model is used in the auto-aggressive fashion ']

['the token is a hard decision, and the gradient cannot be propagated through this decision ']

LearnToGrow · May 11, 2022, 3:30pm

I want to edit my question but it looks not possible.
adding theses lines

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained("t5-small")

model = AutoModelForSeq2SeqLM.from_pretrained("t5-small")

model.resize_token_embeddings(len(tokenizer))

model.to("cuda")

model.eval()

Topic		Replies	Views
Dealing with multiple sequences in T5ForConditionalGeneration 🤗Transformers	0	482	May 6, 2022
T5 forward pass versus generate, latter outputs non-sense Beginners	8	2907	March 25, 2021
T5 weird behavior between model.forward() and model.generate 🤗Transformers	0	109	March 31, 2024
T5 models have non-deterministic outputs even after disabling dropout 🤗Transformers	9	176	September 15, 2024
Model.generate generates same output for different inputs 🤗Transformers	1	613	November 13, 2023

The output of T5 is not consistent on multiple sequences

Related topics