T5 Model Generate and Model Outputs Vastly Different

hockeybro12 · August 17, 2022, 1:46am

Hello,

I have finetuned a T5 model for generation, and it learns well and the loss decreases a lot. When I evaluate the model, I’m happy with the results when I do model.forward, but when I call model.generate, no matter what parameters I set for decoding, the model performs poorly. In fact, once the model has overfitted, model.generate something pretty similar every time. I’m unsure of what I’m doing wrong, and I’ve looked at lots of topics. It seems like right now I don’t even have to shift my inputs right for the decoder as long as I’m not passing decoder inputs, so I’m very unsure what the problem is.

Also, how do people normally decide what decoding method to use? Is it just based on what “looks good” on the validation set?

Here is my tokenizing code:

encoding = tokenizer(
        [task_prefix + sequence for sequence in input_sequences],
        #padding="longest",
        max_length=max_source_length,
        truncation=True,
        return_tensors="pt",
        padding="max_length",
    )
    target_tensor_input, t5_masks = encoding.input_ids, encoding.attention_mask

    target_encoding = tokenizer(
        output_sequences, padding="longest", max_length=max_target_length, truncation=True
    )
    labels = target_encoding.input_ids

    # replace padding token id's of the labels by -100 so it's ignored by the loss
    t5_outputs = torch.tensor(labels)
    t5_outputs [t5_outputs == tokenizer.pad_token_id] = -100

Here is a minimalist version of my model training code:

for target_tensor_input, t5_outputs, t5_masks in tqdm(train_dataloader):
    loss = model(input_ids=target_tensor_input.cuda(), attention_mask=t5_masks.cuda(), labels=t5_outputs.cuda()).loss

   loss.backward()
   optimizer.step()
   optimizer.zero_grad()

My evaluation code is similar:

with torch.no_grad():
    for target_tensor_input, t5_outputs, t5_masks in tqdm(train_dataloader):
        # the output here varies
        batch_outputs_generate = model.generate(input_ids=target_tensor_input.cuda(), min_length=2, max_length=max_target_length, do_sample=True, num_beams=8)
        batch_outputs_generate = tokenizer.batch_decode(batch_outputs_generate, skip_special_tokens=True)

        # the output here is good
        batch_outputs_forward = model(input_ids=target_tensor_input.cuda(), attention_mask=t5_masks.cuda(), labels=t5_outputs.cuda())
       batch_outputs_forward_output = tokenizer.batch_decode(torch.argmax(batch_outputs_forward.logits, dim=2).tolist(), skip_special_tokens=True)

Thanks!

hockeybro12 · September 11, 2022, 10:12pm

Hello everyone, any update here? I’m Still confused on this topic.

Topic		Replies	Views
T5 - model.generate() issue Beginners	2	696	March 18, 2024
T5 forward pass versus generate, latter outputs non-sense Beginners	8	2899	March 25, 2021
The output of T5 is not consistent on multiple sequences 🤗Transformers	1	867	May 11, 2022
Proper way to do conditional generation with T5 Beginners	1	2073	January 20, 2023
Understanding T5 with custom embedding 🤗Transformers	3	9	July 9, 2025

T5 Model Generate and Model Outputs Vastly Different

Related topics