I am trying to fine-tune BART for text infilling task, for example, I want my model learn “Steve Jobs is founder of Apple” from “Steve Jobs [MASK] Apple”.
My questions are mainly the following three:
(1) BartModel and BartForConditionalGeneration, which one should I choose?
(2) Can you provide examples of how to use the corresponding API?
(3) How to compute the ‘loss’ of text infilling task?
You should use BartForConditionalGeneration, since this model adds a language modeling head on top of BartModel. BartModel itself is just the encoder-decoder Transformer, without any head on top. The language modeling head on top is necessary, in order to decode the hidden states to actual predicted tokens, and to generate text.
The loss gets automatically calculated for you when you provide labels to the model. It’s standard cross entropy loss between the predictions of the model and the labels.
Hi, I’m curious about a couple of things: 1) did you get this model running well, and 2) would this model also work for more standard “next-token” causal LM?
EDIT: Oh, also, would it be expected to do both MLM and CLM? That is, with an input like: “The chicken [MASK] to get” could it continue and output something like “The chicken crossed the road to get to the other side”?? That would be pretty much ideal for my use-case. And if so, how would one go about doing this?
(Sorry for hijacking your thread, but I’ve been wondering about something like this for a while!)