BartDecoder outputs perfect predictions even when untrained

SwissSandwich22 · October 27, 2023, 10:16pm

Hi all,

I am new to using Huggingface documentation, so apologies if this is a silly question. I am training a BartDecoder from scratch. I have some other encoding method other than BartEncoder. To the BartDecoder, I pass in:

input_embeds: embeddings of the target token sequence
encoder_hidden_state: last hidden state of my custom encoder
attention_mask: a mask for the target token sequence for the pad tokens

I have also added a head to the decoder so it can output logits of size (batch_size, seq_len, 50265) (last number of the BART vocab size). From there, I use nn.CrossEntropyLoss(reduction = ‘none’) to compare the logits to the true class values. Each time, for each prediction, the loss output is always 0. I have checked, and the output logits always predict the correct word! I am not using a pretrained decoder, nor have I run a single learning step!

I believe I may have a misunderstanding about the attention masks. From the internal documentation it seems BartDecoder has a ._prepare_decoder_attention_mask() method, which I think should handle masking out future context for each prediction during a training step. But I am not sure. Does anyone have a solution to this issue?

Topic		Replies	Views
Why does Bart decoder's attention mask mark relevant indices with 0 instead of 1? Models	1	1918	May 31, 2021
BART learns well, loss decreases, but prediction output is weird 🤗Transformers	2	196	March 3, 2024
T5 - Padded decoder inputs yields differerent results Beginners	1	728	June 14, 2022
A question about the modeling_bart.py Models	1	324	November 12, 2020
EncoderDecoder LM output is perfect ... except that the ending is missing or duplicated Intermediate	0	341	May 6, 2021

BartDecoder outputs perfect predictions even when untrained

Related topics