T5 generate() output doesn't produce <extra_id_0>

Hi,

From the tutorial and my understanding, in the unsupervised denoising training, the model is working like two complementary pieces of the puzzle. Given the example in the tutorial:
‘The <extra_id_0> walks in <extra_id_1> park’
spans like ‘The’ and ‘walks in’ and ‘park’, apart from being one or multiple tokens are replaced with sentinel tokens. and the sentinel tokens ‘<extra_id_0>’ and ‘<extra_id_1>’ are replaced with a predicted tokens again one or more.
so the expected generated output is ‘<extra_id_0> cute dog <extra_id_1> the <extra_id_2>’.

The problem is where I have tested a sentence in which the generated output doesn’t start with <extra_id_0> and the first sentinel token is <extra_id_1>. after few trials I found out that the space after to before a ‘.’ cause this problem. I will share the exact sentence that I generated and you can recreate that.

from transformers import T5ForConditionalGeneration, T5Tokenizer

model = T5ForConditionalGeneration.from_pretrained(“t5-small”)
tokenizer = T5Tokenizer.from_pretrained(“t5-small”)

input = tokenizer(‘Rate leap into darkness 1 points .<extra_id_0> object<extra_id_1> rating<extra_id_2> point . RateBook’, return_tensors=‘pt’).input_ids
output = model.generate(input_ids=input, num_beams=3, num_return_sequences=3,max_length=input.size(1))

for i in range(output.size(0)):
print(tokenizer.decode(output[i]))

#######output#######
Rate<extra_id_1> rating<extra_id_2> 1<extra_id_3> rating rating 1<extra_id_4> 1<extra_id_5> Rate<extra_id_6> rating 1<extra_id_7> Rate
Rate<extra_id_1> rating<extra_id_2> 1<extra_id_3> rating rating 1<extra_id_4> 1<extra_id_5> Rate<extra_id_6> rating rating 1<extra_id_7>
Rate<extra_id_1> rating<extra_id_2> 1<extra_id_3> Rating<extra_id_4> 1<extra_id_5> Rate<extra_id_6> rating rating 1<extra_id_7> Rate<extra_id_8>
###################

but if I remove the space before ‘RateBook’ then the output starts with <extra_id_0>.
to test you can replace the sentence below:
‘Rate leap into darkness 1 points .<extra_id_0> object<extra_id_1> rating<extra_id_2> point .RateBook’

the output is as follow:

<extra_id_0>RateBook<extra_id_1> rating<extra_id_2> 1<extra_id_3>RateBook<extra_id_4> 1<extra_id_5>RateBook<extra_id_6>
<extra_id_0>RateBook is an<extra_id_1> of<extra_id_2> 1<extra_id_3>RateBook<extra_id_4> 1<extra_id_5>Rate
<extra_id_0>RateBook is an<extra_id_1> with a<extra_id_2> of 1<extra_id_3>RateBook<extra_id_4> 1

I would be happy if someone can explain the reason for this behavior.