Help with fine-tune BART for text infilling

MMing · September 6, 2021, 8:01am

Hi guys,

I am trying to fine-tune BART for text infilling task, for example, I want my model learn “Steve Jobs is founder of Apple” from “Steve Jobs [MASK] Apple”.

My questions are mainly the following three:

(1) BartModel and BartForConditionalGeneration, which one should I choose?

(2) Can you provide examples of how to use the corresponding API?

(3) How to compute the ‘loss’ of text infilling task?

nielsr · September 6, 2021, 9:35am

You should use BartForConditionalGeneration, since this model adds a language modeling head on top of BartModel. BartModel itself is just the encoder-decoder Transformer, without any head on top. The language modeling head on top is necessary, in order to decode the hidden states to actual predicted tokens, and to generate text.
Yes, check out my notebook here: https://github.com/NielsRogge/Transformers-Tutorials/tree/master/T5. You just need to use BartForConditionalGeneration instead of T5ForConditionalGeneration (T5 and BART are actually very similar). Also, you can check out the improved documentation I wrote for T5 to illustrate how these models work, both for training and inference.
The loss gets automatically calculated for you when you provide labels to the model. It’s standard cross entropy loss between the predictions of the model and the labels.

jbmaxwell · February 10, 2022, 1:38am

Hi, I’m curious about a couple of things: 1) did you get this model running well, and 2) would this model also work for more standard “next-token” causal LM?

EDIT: Oh, also, would it be expected to do both MLM and CLM? That is, with an input like: “The chicken [MASK] to get” could it continue and output something like “The chicken crossed the road to get to the other side”?? That would be pretty much ideal for my use-case. And if so, how would one go about doing this?
(Sorry for hijacking your thread, but I’ve been wondering about something like this for a while!)

Topic		Replies	Views
Pretraining BART for conditional generation 🤗Transformers	1	983	May 30, 2022
BART generate() output not related to input Intermediate	1	814	February 17, 2022
Is BART guaranteed to not mess up unmasked tokens during text infilling? Models	1	864	August 24, 2022
Infilling multiple mask spans with BartForConditionalGeneration Intermediate	0	410	July 12, 2022
How to mask multiple tokens in BartForConditionalGeneration? Beginners	3	1092	July 12, 2022

Help with fine-tune BART for text infilling

Related topics