Proper way to do conditional generation with T5

I want to perform a conditional generation with T5. My question is then, does model.generate() actually does conditional generation? Say that the desired sequence has three tokens “A B C”, and B is masked. So the model input is “A _ C”. In this case, when I call model.generate(input_ids(“A _ C”)), does the generation process takes in consideration that C is also given? I have doubt because according to the docs Generation, generate() works in an auto-regressive fashion, so I assume the prediction only depends on previous tokens, not the later tokens?

A related question is from reading the code at Using the T5 model with huggingface's mask-fill pipeline · Issue #3985 · huggingface/transformers · GitHub, which seems to be doing conditional generation (but of course, I have question about it). In particular, when I run the following code:

import torch

DEVICE = torch.device('cuda' if torch.cuda.is_available() else 'cpu') 

t5_tokenizer = T5Tokenizer.from_pretrained("t5-small")
t5_mlm = T5ForConditionalGeneration.from_pretrained("t5-small").to(DEVICE)
# Input text
text = "The <extra_id_0> walks in <extra_id_1> park"

input_ids = tokenizer("The <extra_id_0> walks in <extra_id_1> park", return_tensors="pt")

# Generaing 20 sequences with maximum length set to 5
outputs = t5_mlm.generate(input_ids=input_ids, 
                          num_beams=200, num_return_sequences=1,
t5_tokenizer.decode(outputs[0][:], skip_special_tokens=False, clean_up_tokenization_spaces=False)

I get

'<extra_id_0> park offers<extra_id_1> the<extra_id_2> park.'

The part that says " the<extra_id_2> park." seems redundant to me, why is it generated? Thanks!


Hey zekeZZ!
I have problems running your code. The tokenizer is defined as t5_tokenizer, but isn’t used.
Also I don’t understand the input text = "The <extra_id_0> walks in <extra_id_1> park", where <extra_id_1> should be the end of the sentence.
This is normally wrapped by the (fill-mask)-pipeline: Mask-fill pipeline for t5 and flan-t5 · Issue #21211 · huggingface/transformers · GitHub
Have also a look at the flan-t5 models, which are trained with a better dataset: google/flan-t5-small · Hugging Face