SpeechEncoderDecoderModel bad performance while generating

kinetical · March 24, 2022, 10:27am

Hi, I have recently finetuned a SpeechEncoderDecoderModel for speech translation under low-resource conditions, the speech encoder is a xls-r-300m, while the decoder is mbart-large. The model converged and showed a nice loss curve that ends with relatively small loss values. However, I observed that when I try to translate an audio, if I use prediction = model(input_values=batch["input_values"], input_ids=batch["labels"]["input_ids"]), then the results are fine (BLEU of 9.4). But if I use prediction = model.generate(input_values = batch["input_values"], attention_mask = batch["attention_mask"]), then the result will have a significant drop (BLEU of 1.78).

I read that the generate method will try to infer autoregressively. Is that the reason why there are such a big drop of performance? Or am I doing something not right?

Thanks!

Topic		Replies	Views
EncoderDecoderModel loaded from pre-trained checkpoints fails when calling generate 🤗Transformers	5	605	June 20, 2024
Warm-starting encoder-decoder models using EncoderDecoderModel always giving an empty string after fine-tuning 🤗Transformers	0	113	March 25, 2024
BERT2RND EncoderDecoderModel predicts random words for Translation tasks 🤗Transformers	0	379	May 30, 2022
EncoderDecoder LM output is perfect ... except that the ending is missing or duplicated Intermediate	0	339	May 6, 2021
EncoderDecoderModel for Machine Translation 🤗Transformers	0	441	May 21, 2022

SpeechEncoderDecoderModel bad performance while generating

Related topics