Understanding T5 with custom embedding

Hello everyone,

I am passing a custom input embedding via inputs_embeds to a T5ForConditionalGeneration model like this:

encoder_outputs = model.encoder(
        inputs_embeds=input_embedding_batch,
        attention_mask=attention_mask
    )

Which gets me a shape of [5, 122, 1024]

Then I pass this to generate my target like this:

target = model.generate(
    encoder_outputs=encoder_outputs,
    attention_mask=attention_mask,
    max_length=max_len,
    min_length=min_len,
    **gen_kwargs_fold2AA
)

Normally (when I pass just the token_ids) the model would generate a sequence of the same length, so 122 in my case. However here it always generates a sequence of length max_len and I don’t know why. Is the encoder_output still given as starting sequence and the decoder just continues generating based on this ?

Thank you for your time and help

1 Like

This is the output I got in my environment…

import torch
from transformers import T5ForConditionalGeneration, T5Tokenizer

# Load model and tokenizer
device = "cuda" if torch.cuda.is_available() else "cpu"
model = T5ForConditionalGeneration.from_pretrained("t5-small").to(device)
tokenizer = T5Tokenizer.from_pretrained("t5-small", lagacy=False)

# Prepare a batch of input token IDs and attention mask
text = "translate English to German: The weather is nice today."
batch = tokenizer(text, return_tensors="pt").to(device)
input_ids = batch.input_ids                  # shape: [1, max_len]
attention_mask = batch.attention_mask
max_len = 4
min_len = 2

# Manually compute input embeddings via the encoder's embedding layer
inputs_embeds = model.get_encoder().embed_tokens(input_ids)
print(f"inputs_embeds shape: {inputs_embeds.shape}")
# inputs_embeds shape: [1, max_len, d_model]

# 1) Generate using precomputed encoder_outputs
encoder_outputs = model.encoder(
    inputs_embeds=inputs_embeds,
    attention_mask=attention_mask
)
print(f"encoder_outputs shape: {encoder_outputs.last_hidden_state.shape}")
# Decoder sees only a BOS token as prefix, so length = max_length
generated_ids1 = model.generate(
    encoder_outputs=encoder_outputs,
    attention_mask=attention_mask,
    max_length=max_len,      # we might expect max_len, but decoder prefix is length 1
    min_length=min_len,
)
print("\nUsing encoder_outputs:")
print("Requested max_length =", max_len)
print("Generated length    =", generated_ids1.shape[-1])
print(tokenizer.decode(generated_ids1[0], skip_special_tokens=True))

# 2) Generate by passing embeddings directly to generate()
#    Here generate() handles the encoder internally and accounts for prompt length
generated_ids2 = model.generate(
    inputs_embeds=inputs_embeds,
    attention_mask=attention_mask,
    max_length=max_len,      # now prompt length is max_len, so output length matches
    min_length=min_len,
)
print("\nUsing inputs_embeds in generate:")
print("Requested max_length =", max_len)
print("Generated length    =", generated_ids2.shape[-1])
print(tokenizer.decode(generated_ids2[0], skip_special_tokens=True))

"""
inputs_embeds shape: torch.Size([1, 12, 512])
encoder_outputs shape: torch.Size([1, 12, 512])

Using encoder_outputs:
Requested max_length = 4
Generated length    = 4
Das Wetter ist

Using inputs_embeds in generate:
Requested max_length = 4
Generated length    = 4
Das Wetter ist
"""

When you pass encoder_outputs directly, generate() doesn’t know your input length or tokens so it just generates until max_length or EOS. It’s not continuing from anything, just starting fresh based on the encoder output.

1 Like

Hey,

Thank you very much for your answers. Now I understand. So I can just provide the encoder_outputs to the model.generate() and set the max_length to my seq_len of my encoder_output

1 Like