Hi,
From the tutorial and my understanding, in the unsupervised denoising training, the model is working like two complementary pieces of the puzzle. Given the example in the tutorial:
āThe <extra_id_0> walks in <extra_id_1> parkā
spans like āTheā and āwalks inā and āparkā, apart from being one or multiple tokens are replaced with sentinel tokens. and the sentinel tokens ā<extra_id_0>ā and ā<extra_id_1>ā are replaced with a predicted tokens again one or more.
so the expected generated output is ā<extra_id_0> cute dog <extra_id_1> the <extra_id_2>ā.
The problem is where I have tested a sentence in which the generated output doesnāt start with <extra_id_0> and the first sentinel token is <extra_id_1>. after few trials I found out that the space after to before a ā.ā cause this problem. I will share the exact sentence that I generated and you can recreate that.
from transformers import T5ForConditionalGeneration, T5Tokenizer
model = T5ForConditionalGeneration.from_pretrained(āt5-smallā)
tokenizer = T5Tokenizer.from_pretrained(āt5-smallā)
input = tokenizer(āRate leap into darkness 1 points .<extra_id_0> object<extra_id_1> rating<extra_id_2> point . RateBookā, return_tensors=āptā).input_ids
output = model.generate(input_ids=input, num_beams=3, num_return_sequences=3,max_length=input.size(1))
for i in range(output.size(0)):
print(tokenizer.decode(output[i]))
#######output#######
Rate<extra_id_1> rating<extra_id_2> 1<extra_id_3> rating rating 1<extra_id_4> 1<extra_id_5> Rate<extra_id_6> rating 1<extra_id_7> Rate
Rate<extra_id_1> rating<extra_id_2> 1<extra_id_3> rating rating 1<extra_id_4> 1<extra_id_5> Rate<extra_id_6> rating rating 1<extra_id_7>
Rate<extra_id_1> rating<extra_id_2> 1<extra_id_3> Rating<extra_id_4> 1<extra_id_5> Rate<extra_id_6> rating rating 1<extra_id_7> Rate<extra_id_8>
###################
but if I remove the space before āRateBookā then the output starts with <extra_id_0>.
to test you can replace the sentence below:
āRate leap into darkness 1 points .<extra_id_0> object<extra_id_1> rating<extra_id_2> point .RateBookā
the output is as follow:
<extra_id_0>RateBook<extra_id_1> rating<extra_id_2> 1<extra_id_3>RateBook<extra_id_4> 1<extra_id_5>RateBook<extra_id_6>
<extra_id_0>RateBook is an<extra_id_1> of<extra_id_2> 1<extra_id_3>RateBook<extra_id_4> 1<extra_id_5>Rate
<extra_id_0>RateBook is an<extra_id_1> with a<extra_id_2> of 1<extra_id_3>RateBook<extra_id_4> 1
I would be happy if someone can explain the reason for this behavior.