BART generate() output not related to input

furunkel · June 30, 2021, 2:02pm

Hi,
I’m in need of a model that can fill in multi-token masks, and it seems like BART is the best choice at the moment.
I’ve trained a BART model from scratch using a custom collator that masks out sequential groups of tokens.

Unfortunately, when using the generate() method on my BartForConditionalGeneration model, the model output is completely unrelated to the input (with or without mask tokens). The output clearly stems from the dataset distribution, but is not at all related to the input, let alone has filled masks.
Now, if I obtain the logits directly (i.e., model(input_ids).logits), the predictions for the token in the input look perfectly fine, so the model seems to have learned something.

The model was trained as follows:
Input: <s>This is <mask></s>
Label: <s>This is some input</s>
I let the library generate the decoder input, it should look something like: </s><s>This is some input.

I wonder if I’m missing something obvious here?

jbmaxwell · February 17, 2022, 3:38am

Did you make any progress with this? I’m also looking to pretrain BART from scratch for infill generation.

Topic		Replies	Views
Infilling multiple mask spans with BartForConditionalGeneration Intermediate	0	416	July 12, 2022
Pretraining BART for conditional generation 🤗Transformers	1	1015	May 30, 2022
Is BART guaranteed to not mess up unmasked tokens during text infilling? Models	1	871	August 24, 2022
Help with fine-tune BART for text infilling Beginners	2	2221	February 10, 2022
Fill-mask for BART with variable length Models	1	1003	February 9, 2022

BART generate() output not related to input

Related topics