EncoderDecoder LM output is perfect ... except that the ending is missing or duplicated

danringwald · May 6, 2021, 4:11pm

Hello the community,

I have a problem which is way too strange to not to be easily debugged but i just can’t find the problem. Could I use your help ?

I am creating this model:

Input are phonems (eg: b§ZuR)
Output is a proper sentence (in medical French, eg: Bonjour)
The model is an EncoderDecoder with a Bert from scratch as encoder, and a BertLMHeadModel pretrained from ‘Geotrend/bert-base-fr-cased’ as decoder.
I am fine-tuning on some medical french text corpus, phonetized with a phonemic dictionary to get the inputs

The input masks are non-causal and only mask the padding tokens
The output masks are the usual pyramidal masks

Output is obtained with:
model.generate(input_values, decoder_start_token_id=0, eos_token_id=2, pad_token_id=1, num_beams=5, early_stopping=True, max_length=100)[0]

The training goes super well, and I get a rather good 0.17 cross entropy loss at the end. However:
The output is most often missing, sometimes duplicated or random

Examples:

expected target
prediction

suspicion de lipome du cordon droit
[PAD] suspicion de lipome du cordon droit de lipome du cordon droit droit [SEP] [unused2]

pas de phénomènes inflammatoires muqueux significatifs
[PAD] pas de phénomène inflammatoire [SEP] [unused2]

pincement plus marqué du disque intervertébral , mais inchangé par rapport à l' irm précédente
[PAD] pincement plus marqué du disque intervertébral, mais inchangé par rapport à

radiographie bassin face debout et hanche droite
[PAD] radiographie bassin face debout [SEP] [unused2]

homogène de type stéatosique , sans lésion focale notable
[PAD] homogène de type stéatosique, sans lésion focale notable, sans lésion focale notable, [SEP] [unused2]

Do you have any idea of what could be going wrong ?

Topic		Replies	Views
Encoder-Decoder model only generates bos_token's [<s><s><s>] Models	17	3159	December 6, 2022
Warm-starting encoder-decoder models using EncoderDecoderModel always giving an empty string after fine-tuning 🤗Transformers	0	118	March 25, 2024
Unexpected result from transformer model prediction Beginners	0	288	November 21, 2021
Encoder Decoder Model gives same generation results after finetuning 🤗Transformers	2	659	August 4, 2022
Questions on the `BertModelLMHeadModel` 🤗Transformers	7	6260	October 5, 2020

EncoderDecoder LM output is perfect ... except that the ending is missing or duplicated

Related topics