Same generations every batch for each epoch

Hi. I am trying to train an LSTM-based Encoder-Decoder model for paraphrase generation.

The model is as follows:

StackedResidualLSTM(
(encoder): RecurrentEncoder(
(embed_tokens): Embedding(30522, 256, padding_idx=0)
(dropout): Dropout(p=0.5, inplace=False)
(rnn): LSTM(256, 256, num_layers=2, batch_first=True, dropout=0.5)
)
(decoder): RecurrentDecoder(
(embed_tokens): Embedding(30522, 256, padding_idx=0)
(dropout_in_module): Dropout(p=0.5, inplace=False)
(dropout_out_module): Dropout(p=0, inplace=False)
(rnn): LSTM(256, 256, num_layers=2, batch_first=True)
(fc_out): Linear(in_features=256, out_features=30522, bias=True)
)
)

I use HuggingFace’s BERT tokenizer and GenerationMixin class to generate. My model has pad_token: [PAD] - 0, eos_token: [SEP] - 102, and decoder_start_token: [CLS] - 101. My generations are always bad and the same for every batch of an epoch. I tried as loss function CrossEntropy with right shifted decoder_input_ids and standard labels and CrossEntropy with decoder_input_ids = labels[:, :-1], labels=labels[:, 1:]. The loss goes down, but I cannot seem to figure out what is preventing the model from properly generating sentences and outputting the same generations for every entry of the batch.

Example of input_ids, generated and targets for first entry of 3 consecutive batches in an epoch:

2022-04-22 08:50:02,118 Train INFO:
Source: is it better to be single?
Preds: how can i can i can i can i can i can i can i can i can i can i can i can i can i can i can i can i can i can i can i can i can i can i can i can i
Target: is it better to be in a relationship or to be single?

2022-04-22 08:50:02,230 Train INFO:
Source: how do i get change in my look?
Preds: how can i can i can i can i can i can i can i can i can i can i can i can i can i can i can i can i can i can i can i can i can i can i can i can i
Target: how should i change my look?

2022-04-22 08:50:02,339 Train INFO:
Source: what are the products of yeast fermentation and how are they used?
Preds: how can i can i can i can i can i can i can i can i can i can i can i can i can i can i can i can i can i can i can i can i can i can i can i can i
Target: for what products is the yeast fermentation needed?

Do you have any idea what can I be? Thanks for the help.