Hi. I am trying to train an LSTM-based Encoder-Decoder model for paraphrase generation.
The model is as follows:
StackedResidualLSTM(
(encoder): RecurrentEncoder(
(embed_tokens): Embedding(30522, 256, padding_idx=0)
(dropout): Dropout(p=0.5, inplace=False)
(rnn): LSTM(256, 256, num_layers=2, batch_first=True, dropout=0.5)
)
(decoder): RecurrentDecoder(
(embed_tokens): Embedding(30522, 256, padding_idx=0)
(dropout_in_module): Dropout(p=0.5, inplace=False)
(dropout_out_module): Dropout(p=0, inplace=False)
(rnn): LSTM(256, 256, num_layers=2, batch_first=True)
(fc_out): Linear(in_features=256, out_features=30522, bias=True)
)
)
I use HuggingFace’s BERT tokenizer and GenerationMixin class to generate. My model has pad_token: [PAD] - 0, eos_token: [SEP] - 102, and decoder_start_token: [CLS] - 101. My generations are always bad and the same for every batch of an epoch. I tried as loss function CrossEntropy with right shifted decoder_input_ids and standard labels
and CrossEntropy with decoder_input_ids = labels[:, :-1], labels=labels[:, 1:]
. The loss goes down, but I cannot seem to figure out what is preventing the model from properly generating sentences and outputting the same generations for every entry of the batch.
Example of input_ids, generated and targets for first entry of 3 consecutive batches in an epoch:
2022-04-22 08:50:02,118 Train INFO:
Source: is it better to be single?
Preds: how can i can i can i can i can i can i can i can i can i can i can i can i can i can i can i can i can i can i can i can i can i can i can i can i
Target: is it better to be in a relationship or to be single?2022-04-22 08:50:02,230 Train INFO:
Source: how do i get change in my look?
Preds: how can i can i can i can i can i can i can i can i can i can i can i can i can i can i can i can i can i can i can i can i can i can i can i can i
Target: how should i change my look?2022-04-22 08:50:02,339 Train INFO:
Source: what are the products of yeast fermentation and how are they used?
Preds: how can i can i can i can i can i can i can i can i can i can i can i can i can i can i can i can i can i can i can i can i can i can i can i can i
Target: for what products is the yeast fermentation needed?
Do you have any idea what can I be? Thanks for the help.