Warm-starting encoder-decoder models using EncoderDecoderModel always giving an empty string after fine-tuning

System Info

I am trying to train a seq2seq model using EncoderDecoderModel class and found this blog very helpful. Thanks to @patrickvonplaten for his excellent explanation. Following this blog I fine-tuned a seq2seq model where I used a BERT ([BanglaBERT] an Electra) model as encoder and [XGLM] as decoder using [BanglaParaphrase] data. But after fine-tuning the model always generates an empty string or garbage output. Now I do not understand where the problem is. Can anyone please help me find the bug in the code.

Thanks.

Expected behavior

Input-output for my code:
{‘target’: ‘āĻ¸āĻŋāĻĒāĻŋāĻ“ āĻ†āĻšāĻ¤ āĻĨāĻžāĻ•āĻžāĻ¯āĻŧ āĻ¯ā§āĻĻā§āĻ§ āĻĒāĻ°āĻŋāĻšāĻžāĻ˛āĻ¨āĻžāĻ° āĻĻāĻžāĻ¯āĻŧāĻŋāĻ¤ā§āĻŦ āĻāĻ¸ā§‡ āĻĒāĻĄāĻŧā§‡āĻ›āĻŋāĻ˛ āĻ¸ā§‡āĻŽā§āĻĒā§āĻ°ā§‹āĻ¨āĻŋāĻ¯āĻŧāĻžāĻ¸ā§‡āĻ° āĻ•āĻžāĻāĻ§ā§‡āĨ¤â€™,
‘pred_target’: ‘’}

which should be something like this (should give the paraphrased sentence according to the input sentence in Bangla):
{‘target’: ‘āĻ¸āĻŋāĻĒāĻŋāĻ“ āĻ†āĻšāĻ¤ āĻĨāĻžāĻ•āĻžāĻ¯āĻŧ āĻ¯ā§āĻĻā§āĻ§ āĻĒāĻ°āĻŋāĻšāĻžāĻ˛āĻ¨āĻžāĻ° āĻĻāĻžāĻ¯āĻŧāĻŋāĻ¤ā§āĻŦ āĻāĻ¸ā§‡ āĻĒāĻĄāĻŧā§‡āĻ›āĻŋāĻ˛ āĻ¸ā§‡āĻŽā§āĻĒā§āĻ°ā§‹āĻ¨āĻŋāĻ¯āĻŧāĻžāĻ¸ā§‡āĻ° āĻ•āĻžāĻāĻ§ā§‡āĨ¤â€™,
‘pred_target’: ‘āĻ¸āĻŋāĻĒāĻŋāĻ“ āĻ•āĻ°ā§āĻ¤ā§ƒāĻ• āĻ†āĻšāĻ¤ āĻšāĻ¯āĻŧā§‡ āĻ¸ā§‡āĻŽāĻĒā§āĻ°ā§‹āĻ¨āĻŋāĻ¯āĻŧāĻžāĻ¸ā§‡āĻ° āĻ•āĻžāĻāĻ§ā§‡ āĻ¯ā§āĻĻā§āĻ§ āĻĒāĻ°āĻŋāĻšāĻžāĻ˛āĻ¨āĻžāĻ° āĻĻāĻžāĻ¯āĻŧāĻŋāĻ¤ā§āĻŦ āĻ†āĻ¸ā§‡āĨ¤â€™}