Hello everyone,
I am passing a custom input embedding via inputs_embeds to a T5ForConditionalGeneration model like this:
encoder_outputs = model.encoder(
inputs_embeds=input_embedding_batch,
attention_mask=attention_mask
)
Which gets me a shape of [5, 122, 1024]
Then I pass this to generate my target like this:
target = model.generate(
encoder_outputs=encoder_outputs,
attention_mask=attention_mask,
max_length=max_len,
min_length=min_len,
**gen_kwargs_fold2AA
)
Normally (when I pass just the token_ids) the model would generate a sequence of the same length, so 122 in my case. However here it always generates a sequence of length max_len and I don’t know why. Is the encoder_output still given as starting sequence and the decoder just continues generating based on this ?
Thank you for your time and help
1 Like
This is the output I got in my environment…
import torch
from transformers import T5ForConditionalGeneration, T5Tokenizer
# Load model and tokenizer
device = "cuda" if torch.cuda.is_available() else "cpu"
model = T5ForConditionalGeneration.from_pretrained("t5-small").to(device)
tokenizer = T5Tokenizer.from_pretrained("t5-small", lagacy=False)
# Prepare a batch of input token IDs and attention mask
text = "translate English to German: The weather is nice today."
batch = tokenizer(text, return_tensors="pt").to(device)
input_ids = batch.input_ids # shape: [1, max_len]
attention_mask = batch.attention_mask
max_len = 4
min_len = 2
# Manually compute input embeddings via the encoder's embedding layer
inputs_embeds = model.get_encoder().embed_tokens(input_ids)
print(f"inputs_embeds shape: {inputs_embeds.shape}")
# inputs_embeds shape: [1, max_len, d_model]
# 1) Generate using precomputed encoder_outputs
encoder_outputs = model.encoder(
inputs_embeds=inputs_embeds,
attention_mask=attention_mask
)
print(f"encoder_outputs shape: {encoder_outputs.last_hidden_state.shape}")
# Decoder sees only a BOS token as prefix, so length = max_length
generated_ids1 = model.generate(
encoder_outputs=encoder_outputs,
attention_mask=attention_mask,
max_length=max_len, # we might expect max_len, but decoder prefix is length 1
min_length=min_len,
)
print("\nUsing encoder_outputs:")
print("Requested max_length =", max_len)
print("Generated length =", generated_ids1.shape[-1])
print(tokenizer.decode(generated_ids1[0], skip_special_tokens=True))
# 2) Generate by passing embeddings directly to generate()
# Here generate() handles the encoder internally and accounts for prompt length
generated_ids2 = model.generate(
inputs_embeds=inputs_embeds,
attention_mask=attention_mask,
max_length=max_len, # now prompt length is max_len, so output length matches
min_length=min_len,
)
print("\nUsing inputs_embeds in generate:")
print("Requested max_length =", max_len)
print("Generated length =", generated_ids2.shape[-1])
print(tokenizer.decode(generated_ids2[0], skip_special_tokens=True))
"""
inputs_embeds shape: torch.Size([1, 12, 512])
encoder_outputs shape: torch.Size([1, 12, 512])
Using encoder_outputs:
Requested max_length = 4
Generated length = 4
Das Wetter ist
Using inputs_embeds in generate:
Requested max_length = 4
Generated length = 4
Das Wetter ist
"""
When you pass encoder_outputs
directly, generate()
doesn’t know your input length or tokens so it just generates until max_length
or EOS. It’s not continuing from anything, just starting fresh based on the encoder output.
1 Like
Hey,
Thank you very much for your answers. Now I understand. So I can just provide the encoder_outputs to the model.generate() and set the max_length to my seq_len of my encoder_output
1 Like