T5 Generates very short summaries

Hi, I was looking to post a topic about T5 seemingly never generating the EOS token and this popped up as potentially related.

I made sure that the eos token is actually added at the end of my training inputs and outputs, and have made sure that eos actually has a different token id from whitespace (after reading https://github.com/huggingface/transformers/issues/5142).

Tried with both beam search and top k sampling, with both short and long max output lengths, but it never generates eos, just goes on until it gets cut off by the max length. Observed this with both t5-large and with my finetuned model.

Any insight?

Could you post an example for t5-large so we can take a look

Hi @valhalla,

Thanks for your reply and time :slight_smile:

Here is the code loading model and tokenizer and printing example inputs and corresponding outputs.

model = T5ForConditionalGeneration.from_pretrained('t5-large')
special_tokens = [
        '_self_say',
        '_partner_say',
        '_self_act',
        '_partner_act',
        '_self_emote',
        '_partner_emote',
        '_self_name',
        '_self_persona',
        '_setting_name',
        '_setting_desc',
        '_object',
        '_option',
        '_context',
        '_history'
]
tokenizer = T5Tokenizer.from_pretrained('t5-large', extra_ids=0, additional_special_tokens=special_tokens)

for input_seq in inputs.input_ids:
        print("Input:", tokenizer.decode(input_seq, skip_special_tokens=False))
        input_seq = torch.LongTensor(input_seq)
        output_ids = model.generate(
            input_seq.unsqueeze(0),
            do_sample=True,
            top_k=50,
            top_p=0.95,
            max_length=64,
            repetition_penalty=2,
            length_penalty=0.7).squeeze()
        print("\nResponse:", tokenizer.decode(output_ids, skip_special_tokens=False))
    print(output_ids, "\n")

Here are two sample inputs/outputs. As you can see from the printed output ids, they do not end with the eos token. I made sure that the inputs end with eos token (</s>, id of 1).

Input: _context _setting_name Watchtower _setting_desc The tower is the largest section of the castle. It contains an observatory for nighttime scouting, but is also used by the wise men to study the stars. _object an alarm horn _self_name court wizard _self_persona I am an advisor of anything magical. I sell spells to those who need them. I am wealthy and hold an important place in political life _option hit soldier _option hug soldier _history 

Response: <extra_id_-100> on Guardaça. Intensiv<extra_id_-99> hit soldier îi măsourî sărdă<extra_id_-98> sănătos<extra_id_-97>ească<extra_id_-96>à<extra_id_-95><extra_id_-94> și<extra_id_-93> îndeb<extra_id_-92><extra_id_-91>Secretary of Magical Affairs<extra_id_-90> Watchtower<extra_id_-89> The Watch tower contains the most elaborate sections of the castle and is also
tensor([    0, 32099,    30, 12899,     9, 11666,     5,     3, 26970, 32098,
         1560, 21982,  1889,    23,  3906,     7,  1211,  3633,   246,    52,
           26,    98, 32097, 31004, 32096,  4927, 32095,    85, 32094, 32093,
          198, 32092,   111,   221,   115, 32091, 32090,   134,    15, 16794,
         1208,    13,  9222,   138, 12078, 32089,  4195,   235,  3321, 32088,
           37,  4195,  7293,  2579,     8,   167, 16224,  6795,    13,     8,
        13243,    11,    19,    92]) 

Input: _context _setting_name Tower _setting_desc The inside tower is made from a combination of wood and brick. It also has some metal to hold all the doors in place. _object a door _object a door _self_name pet dog _self_persona I am mans best friend and I wouldn't have it any other way. I tend to my master and never leave his side. I sleep at his feet and guard the room at night from things that go bump in the night. _option hug knight _option hit knight _history 

Response: <extra_id_-100>,<extra_id_-99> erreichen<extra_id_-98> hit knightăminterească a door<extra_id_-97>îî<extra_id_-96> a doorăm!<extra_id_-95> timpul<extra_id_-94> pentru<extra_id_-93>itățile Tower<extra_id_-92> I am Mans best friend.<extra_id_-91> doud<extra_id_-90> casă facem<extra_id_-89> a doorșteștehlen<extra_id_-88><extra_id_-87> tower is made from different materials. The outside
tensor([    0, 32099,     6, 32098, 10870, 32097,  1560, 29816,    98,  8215,
           52,  4927,     3,     9,  1365, 32096,  3633,  3633, 32095,     3,
            9,  1365,  2398,    55, 32094,  4156, 32093,   191, 32092, 19701,
        10677, 32091,    27,   183,  1140,     7,   200,  1565,     5, 32090,
          103,    76,    26, 32089, 20405, 11912, 32088,     3,     9,  1365,
         7689,  7689, 11286, 32087, 32086,  7293,    19,   263,    45,   315,
         1397,     5,    37,  1067]) 

Another strange behavior is the presence of these extra_id tokens with a negative id value. I specified to have 0 sentinel tokens when initializing the tokenizer, I wonder if it has an unexpected result.