LSTM Encoder-Decoder not working

AfonsoSousa · June 3, 2022, 5:15pm

I am trying to train an LSTM Encoder-Decoder model for paraphrase generation. My model is as follows:

StackedResidualLSTM(
  (encoder): RecurrentEncoder(
    (embed_tokens): Embedding(30522, 256)
    (dropout): Dropout(p=0.5, inplace=False)
    (rnn): LSTM(256, 256, num_layers=2, batch_first=True, dropout=0.5)
  )
  (decoder): RecurrentDecoder(
    (embed_tokens): Embedding(30522, 128)
    (dropout_in_module): Dropout(p=0.5, inplace=False)
    (dropout_out_module): Dropout(p=0.1, inplace=False)
    (layers): ModuleList(
      (0): LSTMCell(384, 256)
      (1): LSTMCell(256, 256)
    )
    (fc_out): Linear(in_features=256, out_features=30522, bias=True)
  )
)

Following is a print of the source sentence, the sentence fed to the decoder (shifted right), the predictions, and the true sentence (labels). Everything is tokenized with BERT tokenizer:

Source: [CLS] where can i get quality services in brisbane for plaster
and drywall repair? [SEP] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD]
[PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD]
[PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD]
[PAD] [PAD] [PAD] [PAD]

Decoder Input: [CLS] [CLS] where can i get
quality services for plaster and drywall repairs in brisbane? [SEP]
[PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD]
[PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD]
[PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD]

Preds:
[CLS] the? [SEP]? [SEP]? [SEP]? [SEP]? [SEP]? [SEP]? [SEP]? [SEP]?
[SEP]

Target: [CLS] where can i get quality services for plaster and
drywall repairs in brisbane? [SEP] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD]
[PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD]
[PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD]
[PAD] [PAD] [PAD] [PAD] [PAD]

My loss function is a CrossEntropy between the output and labels (the padding token is switched with -100 to ignore). Something like:

loss_fct = CrossEntropyLoss()
loss = loss_fct(logits.view(-1, logits.size(-1)), labels.view(-1))

There are two problems occurring:

the loss does not go down
the generations are all the same for every entry of the same epoch (after weight updating the generations might be different than the ones from the previous epoch, but remain the same for every entry of the new epoch)

Do you have any idea what might I try to fix the issue? Thanks in advance for any help you can provide.

Topic		Replies	Views
Same generations every batch for each epoch Beginners	0	454	April 21, 2022
EnocederDecoder training/prediction with two tokenizers Beginners	1	779	October 22, 2024
Padded sequences in language model (like BERT) with LSTM on top Beginners	0	360	September 9, 2022
Question answering system using SQuAD through Seq2Seq Models	0	748	November 13, 2021
EncoderDecoder LM output is perfect ... except that the ending is missing or duplicated Intermediate	0	339	May 6, 2021

LSTM Encoder-Decoder not working

Related topics