Warm-starting encoder-decoder models using EncoderDecoderModel always giving an empty string after fine-tuning

System Info

I am trying to train a seq2seq model using EncoderDecoderModel class and found this blog very helpful. Thanks to @patrickvonplaten for his excellent explanation. Following this blog I fine-tuned a seq2seq model where I used a BERT ([BanglaBERT] an Electra) model as encoder and [XGLM] as decoder using [BanglaParaphrase] data. But after fine-tuning the model always generates an empty string or garbage output. Now I do not understand where the problem is. Can anyone please help me find the bug in the code.

Thanks.

Expected behavior

Input-output for my code:
{‘target’: ‘āϏāĻŋāĻĒāĻŋāĻ“ āφāĻšāϤ āĻĨāĻžāĻ•āĻžāϝāĻŧ āϝ⧁āĻĻā§āϧ āĻĒāϰāĻŋāϚāĻžāϞāύāĻžāϰ āĻĻāĻžāϝāĻŧāĻŋāĻ¤ā§āĻŦ āĻāϏ⧇ āĻĒāĻĄāĻŧ⧇āĻ›āĻŋāϞ āϏ⧇āĻŽā§āĻĒā§āϰ⧋āύāĻŋāϝāĻŧāĻžāϏ⧇āϰ āĻ•āĻžāρāϧ⧇āĨ¤â€™,
‘pred_target’: ‘’}

which should be something like this (should give the paraphrased sentence according to the input sentence in Bangla):
{‘target’: ‘āϏāĻŋāĻĒāĻŋāĻ“ āφāĻšāϤ āĻĨāĻžāĻ•āĻžāϝāĻŧ āϝ⧁āĻĻā§āϧ āĻĒāϰāĻŋāϚāĻžāϞāύāĻžāϰ āĻĻāĻžāϝāĻŧāĻŋāĻ¤ā§āĻŦ āĻāϏ⧇ āĻĒāĻĄāĻŧ⧇āĻ›āĻŋāϞ āϏ⧇āĻŽā§āĻĒā§āϰ⧋āύāĻŋāϝāĻŧāĻžāϏ⧇āϰ āĻ•āĻžāρāϧ⧇āĨ¤â€™,
‘pred_target’: ‘āϏāĻŋāĻĒāĻŋāĻ“ āĻ•āĻ°ā§āϤ⧃āĻ• āφāĻšāϤ āĻšāϝāĻŧ⧇ āϏ⧇āĻŽāĻĒā§āϰ⧋āύāĻŋāϝāĻŧāĻžāϏ⧇āϰ āĻ•āĻžāρāϧ⧇ āϝ⧁āĻĻā§āϧ āĻĒāϰāĻŋāϚāĻžāϞāύāĻžāϰ āĻĻāĻžāϝāĻŧāĻŋāĻ¤ā§āĻŦ āφāϏ⧇āĨ¤â€™}