What can cause model.generate (BART) output to be gibberish after fine-tuning?

valhalla · August 31, 2020, 6:10am

Hi @rgwatwormhill, gradient need to be zeroed for every pytorch model, otherwise they get accumulated.

Hi @hyura
For Bart or any other seq2seq model the decoder_input_ids need to be shifted right i.e the decoder sequence need to start with decoder_start_token_id which is usually bos or pad or eos token. For Bart, it’s eos.

This means the decoder first takes the decoder_start_token_id and produces the first token in the labels. If it’s not shifted then it’s just copying whatever the token it has received at that step, which could be reason for this weird generation.

If you are on master version, then there are few helpers for preparing the data.

Use prepare_seq2seq_batch method, this will return input_ids , attention_mask and labels
use modeling_bart.shift_tokens_right to prepare the decoder_input_ids.
set pad token in the labels to -100 so they’ll be ignored by the cross entropy loss

input_text = "Some input text"
output_text = "Paraphrase Text"

#enc will contain input_ids, attention_mask and labels
enc = tokenizer.prepare_seq2seq_batch(src_texts=input_text, tgt_texts=output_text, return_tensors="pt") 
decoder_input_ids = shift_tokens_right(enc["labels"], tokenizer.pad_token_id)

labels = enc["labels"]
labels[labels == pad_token_id] = -100

Hope this helps.
cc @sshleifer

Topic		Replies	Views
BART model fine-tuning give unexpected not relevant results Beginners	0	359	July 23, 2021
Train Bart for Conditional Generation (e.g. Summarization) Models	14	17155	November 22, 2023
Finetuning BART for Abstractive Text Summarisation Beginners	1	5233	September 9, 2024
Question regarding training of BartForConditionalGeneration Models	1	2025	March 2, 2021
BART-base generating completely wrong output after training for more than 3 epochs Intermediate	0	854	July 8, 2021

What can cause model.generate (BART) output to be gibberish after fine-tuning?

Related topics